Decaying

Alibaba Releases Qwen3.5 Small Models: 9B Achieves GPT-oss 20B–120B Level Performance

Original: Breaking: The small qwen3.5 models have been dropped View original →

Read in other languages: 한국어日本語
LLM Mar 2, 2026 By Insights AI (Reddit) 1 min read 23 views Source

Overview

Alibaba's Qwen team has released the Qwen3.5 small model series, comprising three sizes: 0.8B, 4B, and 9B parameters. All models are immediately available on Hugging Face with GGUF quantizations from unsloth and community contributors.

Key Performance

Community benchmarks show the Qwen3.5 9B model performing at a level comparable to GPT-oss models in the 20B–120B range — an exceptional parameter efficiency that opens up high-quality inference to users with mid-range consumer GPUs.

The 0.8B model targets mobile deployment, while the 4B model offers a compelling middle ground. The community quickly noted that disabling thinking mode and setting temperature around 0.45 yields the best results, as the models tend to overthink in reasoning tasks. Additionally, bf16 KV cache (not f16) is required for optimal performance on engines like llama.cpp.

Community Reception

The LocalLLaMA community responded with immediate enthusiasm, with quantized versions appearing within hours of release. Multiple benchmark comparisons against Qwen 3 predecessors are already being shared, showing clear improvements across standard evaluation metrics.

Availability

Models are available at Hugging Face under the Qwen organization, with GGUF variants from unsloth available for llama.cpp and compatible runtimes.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.