Qwen 2.5 → 3 → 3.5: How Alibaba's Smallest Models Have Transformed Across Generations

Original: Qwen 2.5 -> 3 -> 3.5, smallest models. Incredible improvement over the generations. View original →

Read in other languages: 한국어日本語
LLM Mar 3, 2026 By Insights AI (Reddit) 1 min read 4 views Source

Three Generations of Density Improvements

Alibaba's Qwen model family has seen extraordinary efficiency gains across generations. A community comparison post on r/LocalLLaMA (score: 681) highlighted just how much has changed from Qwen 2.5 to Qwen 3 to Qwen 3.5 in the smallest model tiers.

Qwen 3 vs. Qwen 2.5

Qwen 3 achieved approximately 50% density improvement over Qwen 2.5: a Qwen3-1.7B performs comparably to Qwen2.5-3B, Qwen3-4B to Qwen2.5-7B, and so on up the scale. This means users can now get the same performance at roughly half the parameter count.

Qwen 3.5 Small Series (0.8B–9B)

Qwen 3.5's small models (0.8B, 2B, 4B, 9B) are all natively multimodal with 262K context. Performance highlights include:

  • The 9B model scores 81.7 on GPQA Diamond, outperforming the previous-gen 80B model (77.2)
  • The 9B beats GPT-5-Nano by 13+ points on MMMU-Pro and 30+ points on document understanding
  • The 2B model scores 84.5 on OCRBench and 75.6 on VideoMME, surpassing many 7B-class models
  • The 4B can handle text, images, and video from just 8GB of VRAM

Why This Matters

This trajectory shows how quickly the open-source LLM ecosystem is advancing. Capabilities that once required proprietary models with 70B+ parameters are now achievable with locally-runnable models. For the local AI community, Qwen 3.5 is setting a new standard for what small open-source models can do.

Share:

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.