LocalLLaMA Wants Qwen3.5-9B Quant Choices Backed by KLD, Not Vibes
Original: Updated Qwen3.5-9B Quantization Comparison View original →
The LocalLLaMA comparison for Qwen3.5-9B quantizations landed because it solved a very practical problem: there are too many GGUF files, and their names are not enough guidance. Instead of telling users to pick a popular upload, the post compares community quants against a BF16 baseline using mean KLD, or KL Divergence. In the author's framing, lower KLD means the quantized model's probability distribution stays closer to the original weights.
That metric choice is why the thread had technical weight. Perplexity can be noisy and dataset-sensitive; it can improve by accident on a test slice even when the model has drifted. KLD is not magic, but it directly asks how much the quantized distribution moved away from the baseline. For local users choosing between Q8_0, Q4 variants, i-quants, and provider-specific builds, that is a more useful starting point than file size alone.
The table highlighted near-lossless Q8-style options at the top, with multiple entries under a KLD score of 0.01. Commenters treated that as a shared reference rather than a final answer. Some asked for Gemma 4 and larger Qwen runs. Others suggested improving the chart with different marker shapes for different quant publishers. A longer technical comment praised the efficiency calculation while asking for KLD at near-full context lengths, because quantization can hurt long-context behavior even when short-context numbers look fine.
That is the community energy here: LocalLLaMA is moving from casual model recommendations toward repeatable measurement. The post does not decide one universal best quant. It gives users a way to talk about tradeoffs among file size, BPW, KLD, PPL, memory fit, and workload. For local inference, that is often the difference between chasing a filename and making an informed deployment choice.
Related Articles
r/LocalLLaMA liked this comparison because it replaces reputation and anecdote with a more explicit distribution-based yardstick. The post ranks community Qwen3.5-9B GGUF quants by mean KLD versus a BF16 baseline, with Q8_0 variants leading on fidelity and several IQ4/Q5 options standing out on size-to-drift trade-offs.
The LocalLLaMA thread cared less about a release headline and more about which Qwen3.6 GGUF quant actually works. Unsloth’s benchmark post pushed the discussion into KLD, disk size, CUDA 13.2 failures, and the messy details that decide local inference quality.
A Reddit post in r/LocalLLaMA introduces a GGUF release of Qwen3.5-122B-A10B Uncensored (Aggressive) alongside new K_P quants. The author claims 0/465 refusals and zero capability loss, but those results are presented as the author’s own tests rather than independent verification.
Comments (0)
No comments yet. Be the first to comment!