Qwen 3.5 Small Models Released: From 0.8B to 9B, Now Running in Browsers

Original: Breaking: The small qwen3.5 models have been dropped View original →

Read in other languages: 한국어日本語
LLM Mar 3, 2026 By Insights AI (Reddit) 1 min read 5 views Source

Qwen 3.5 Small Models Drop

Alibaba's Qwen team has released the Qwen 3.5 small model series to massive community excitement, garnering a score of 1,663 on r/LocalLLaMA — one of the highest scores seen for a model release. The lineup includes 0.8B, 2B, 4B, and 9B parameter models.

Hybrid Architecture Innovation

Qwen 3.5 introduces a hybrid architecture combining Gated DeltaNet layers with standard Gated Attention. The 9B model features 32 layers and 4096 hidden dimensions, with an integrated vision encoder enabling multimodal capabilities. The new linear attention components improve efficiency significantly over pure transformer architectures.

Remarkable Small Model Performance

The 0.8B model runs directly in browsers via WebGPU using Transformers.js, and can execute locally on 7-year-old Android devices like the Samsung S10E. Community benchmarks across all sizes show substantial gains compared to equivalent Qwen 3 models in every category.

Practical Deployment Options

The 9B proves capable for agentic coding tasks, while the 4B runs on Raspberry Pi 5. The 2B excels at OCR, and the 0.8B sets a new bar for on-device AI on Android. Unsloth rapidly released optimized GGUF variants, making these models immediately accessible via llama.cpp and other runtimes.

Impact on Open-Source AI

This release reinforces the trajectory of small open-source models closing the gap with much larger proprietary systems. With capable models now running in browsers, on phones, and on edge hardware without cloud APIs, the democratization of AI inference is accelerating rapidly.

Share:

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.