Qwen 3.5 Small Models Released: From 0.8B to 9B, Now Running in Browsers
Original: Breaking: The small qwen3.5 models have been dropped View original →
Qwen 3.5 Small Models Drop
Alibaba's Qwen team has released the Qwen 3.5 small model series to massive community excitement, garnering a score of 1,663 on r/LocalLLaMA — one of the highest scores seen for a model release. The lineup includes 0.8B, 2B, 4B, and 9B parameter models.
Hybrid Architecture Innovation
Qwen 3.5 introduces a hybrid architecture combining Gated DeltaNet layers with standard Gated Attention. The 9B model features 32 layers and 4096 hidden dimensions, with an integrated vision encoder enabling multimodal capabilities. The new linear attention components improve efficiency significantly over pure transformer architectures.
Remarkable Small Model Performance
The 0.8B model runs directly in browsers via WebGPU using Transformers.js, and can execute locally on 7-year-old Android devices like the Samsung S10E. Community benchmarks across all sizes show substantial gains compared to equivalent Qwen 3 models in every category.
Practical Deployment Options
The 9B proves capable for agentic coding tasks, while the 4B runs on Raspberry Pi 5. The 2B excels at OCR, and the 0.8B sets a new bar for on-device AI on Android. Unsloth rapidly released optimized GGUF variants, making these models immediately accessible via llama.cpp and other runtimes.
Impact on Open-Source AI
This release reinforces the trajectory of small open-source models closing the gap with much larger proprietary systems. With capable models now running in browsers, on phones, and on edge hardware without cloud APIs, the democratization of AI inference is accelerating rapidly.
Related Articles
Alibaba released the Qwen3.5 small model series (0.8B, 4B, 9B). The 9B model achieves performance comparable to GPT-oss 20B–120B, making high-quality local inference accessible to users with modest GPU hardware.
Alibaba launched Qwen3.5, a 397B-parameter open-weight multimodal model supporting 201 languages. The company claims it outperforms GPT-5.2, Claude Opus 4.5, and Gemini 3 on benchmarks, while costing 60% less than its predecessor.
A widely-shared r/LocalLLaMA comparison of Qwen's smallest models across three generations (score: 681) reveals extraordinary efficiency gains. The Qwen 3.5 9B now outperforms the previous-generation 80B on several benchmarks, while the 2B handles video understanding better than many 7B models.
Comments (0)
No comments yet. Be the first to comment!