LLM Reddit 1d ago 1 min read
A new llama.cpp change turns <code>--reasoning-budget</code> into a real sampler-side limit instead of a template stub. The LocalLLaMA thread focused on the tradeoff between cutting long think loops and preserving answer quality, especially for local Qwen 3.5 deployments.