Qwen 3.5-35B-A3B Surpasses GPT-OSS-120B as Daily Driver at 1/3 the Size
Original: Qwen 3.5-35B-A3B is beyond expectations. It's replaced GPT-OSS-120B as my daily driver and it's 1/3 the size. View original →
Qwen 3.5-35B-A3B Exceeds All Expectations
The LocalLLaMA community is rallying around Alibaba's Qwen 3.5-35B-A3B, with a highly-upvoted post declaring it has replaced GPT-OSS-120B as the poster's daily driver — at just one-third the size.
MoE Efficiency: 35B Parameters, 3B Active
The model employs a Mixture of Experts (MoE) architecture: 35 billion total parameters, but only approximately 3 billion are active during inference. This makes it dramatically cheaper to run than comparable dense models while maintaining high output quality.
Real-World Performance
The original poster shared specific production use cases where the model excels:
- Automated message and email triage via N8N with priority-based batching
- Agent systems with dynamic tool selection
- General-purpose development assistance
The consensus in the thread is that Qwen 3.5-35B-A3B punches well above its weight class, particularly for coding and reasoning tasks.
Qwen 3.5 Family Context
The Qwen 3.5 series is Alibaba's latest open-source model family, available in sizes ranging from compact dense models to larger MoE variants. The 35B-A3B continues a trend of Chinese open-source models closing the gap with — and in some cases surpassing — Western counterparts at comparable parameter counts.
Related Articles
LocalLLaMA reacted because the post attacks a very real pain point for running large MoE models on limited VRAM. The author tested a llama.cpp fork that tracks recently routed experts and keeps the hot ones in VRAM for Qwen3.5-122B-A10B, reporting 26.8% faster token generation than layer-based offload at a similar 22GB VRAM budget.
r/LocalLLaMA pushed this past 900 points because it was not another score table. The hook was a local coding agent noticing and fixing its own canvas and wave-completion bugs.
r/LocalLLaMA pushed this post up because the “trust me bro” report had real operating conditions: 8-bit quantization, 64k context, OpenCode, and Android debugging.
Comments (0)
No comments yet. Be the first to comment!