r/LocalLLaMA Tries to Standardize Practical Qwen3.5 Presets

Original: Qwen3.5 Best Parameters Collection View original →

Read in other languages: 한국어日本語
LLM Mar 20, 2026 By Insights AI (Reddit) 2 min read 1 views Source

On March 20, 2026, a r/LocalLLaMA thread titled "Qwen3.5 Best Parameters Collection" reached 123 points and 47 comments. The timing matters because Qwen3.5 had been out for a few weeks: enough time for quantizations, runtimes, and sampler settings to settle, but still early enough that many users were comparing notes rather than following a stable consensus. The original post asked for working presets by use case and shared one starting configuration for Qwen3.5-35B-A3B on llama.cpp v8400, using temp 0.7, top-p 0.8, top-k 20, presence penalty 1.5, repeat penalty 1.0, and a reasoning budget of 1000 for general chat.

What the thread actually surfaced

  • Many commenters said the safest baseline is still the official Qwen recommendations in model cards rather than Reddit folklore.
  • Several users shared different presets for different jobs: thinking coding, thinking general, instruct creative writing, and instruct coding.
  • Reasoning budgets became a major tuning axis, with examples ranging from 4096 to 16384 depending on document length and tolerance for long chains of thought.
  • For tool-calling work, some users reported better results in non-thinking mode with tighter repeat penalties, arguing that long reasoning traces slowed the system without improving outcomes.

That pattern is more interesting than any single parameter list. The LocalLLaMA community is treating inference policy as a first-class layer of model performance. The same checkpoint can feel verbose, unstable, or highly capable depending on whether it is asked to code, chat, call tools, or parse a long document. In other words, the argument is shifting from "Which model wins?" to "What operating profile makes this model useful?"

Why the thread matters

Open-weight ecosystems usually go through the same maturity curve. First the attention is on raw benchmark strength. Then it moves to quant quality, runtime support, and context length. After that, users discover that default sampler settings hide a large part of real-world performance. This thread sits squarely in that third phase. It does not produce one universal preset, but it does show a community converging on a more disciplined approach: start from official settings, then branch by task type and reasoning budget instead of chasing a single magic configuration.

That is useful for anyone evaluating local LLM stacks on consumer GPUs. A model that "thinks too much" in general chat may still be the right choice for coding or document analysis if the sampler and reasoning budget are adjusted correctly. The thread is less a leaderboard update than a sign that Qwen3.5 is entering the phase where operating practice matters almost as much as weights.

Sources: r/LocalLLaMA discussion · Unsloth Qwen3.5 documentation

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.