r/LocalLLaMA Re-ranks Qwen3.5-9B Quants With KLD Instead of Guesswork

Quantization comparison posts are common on r/LocalLLaMA, but many of them still end up as reputation contests or machine-specific anecdotes. This one landed because it tried to give the community something more portable: a distribution-based way to judge how far a quant drifts from the original model. The author ranks Qwen3.5-9B community GGUF files by mean KLD against a BF16 baseline and frames that as a cleaner measure of faithfulness than “it felt good on my box.” That is exactly the kind of utility post the subreddit tends to reward.

The argument in the post is straightforward. Perplexity can be noisy because it leans on a particular evaluation set and can move around for reasons that do not always reflect information loss cleanly. KLD, by contrast, compares the quantized model’s probability distribution directly against the baseline distribution. In the ranking table, Q8_0 variants dominate the top of the fidelity chart: eaddario’s Q8_0 comes in at 0.001198 KLD, unsloth’s UD-Q8_K_XL at 0.001243, and bartowski’s Q8_0 at 0.001405. Once file size is folded back in through the post’s efficiency score, a different set of winners emerges, with several IQ4_XS, IQ4_NL, and Q5_K_S options looking more attractive for real-world memory budgets.

The practical value is in the details the author included. This was not just a chart drop. The post also lists the evaluation dataset, the context setting of 103 chunks at -c 512, the exact ik_llama.cpp build, and the NVIDIA driver version 595.97. That is why the comments immediately moved toward “do Gemma 4 next,” “what about MoE,” and “please add i1 quants.” People were treating the work as a reusable benchmark scaffold, not a one-off screenshot. One commenter even pointed out that mradermacher’s i1 quants seem to punch above their weight, which is the kind of concrete follow-up you only get when readers trust the setup.

The useful readout is simple. If your top priority is minimal drift from BF16, Q8_0 class files still look strongest. If you care more about size-to-fidelity balance, the post suggests that several IQ4 and Q5 variants deserve more attention than the community usually gives them. The original discussion is on r/LocalLLaMA, and the evaluation dataset is linked in the post via this gist. The energy around this thread comes from a familiar frustration: local-LLM users are tired of choosing quants by folklore and want something closer to a measurement culture.

r/LocalLLaMA Re-ranks Qwen3.5-9B Quants With KLD Instead of Guesswork

Related Articles

LocalLLaMA Wants Qwen3.5-9B Quant Choices Backed by KLD, Not Vibes

r/LocalLLaMA benchmark compares Qwen3.5-27B Q4 quants using KLD and size tradeoffs

r/LocalLLaMA Tracks Unsloth Qwen3.5 Dynamic GGUF Update With 150+ KLD Runs

Comments (0)

Leave a Comment

Related Articles

LocalLLaMA Wants Qwen3.5-9B Quant Choices Backed by KLD, Not Vibes
LLM Reddit Apr 16, 2026 1 min read

r/LocalLLaMA benchmark compares Qwen3.5-27B Q4 quants using KLD and size tradeoffs
LLM Reddit Mar 4, 2026 1 min read

r/LocalLLaMA Tracks Unsloth Qwen3.5 Dynamic GGUF Update With 150+ KLD Runs
LLM Reddit Feb 28, 2026 2 min read