Qwen 3.5-35B-A3B Surpasses GPT-OSS-120B as Daily Driver at 1/3 the Size

Qwen 3.5-35B-A3B Exceeds All Expectations

The LocalLLaMA community is rallying around Alibaba's Qwen 3.5-35B-A3B, with a highly-upvoted post declaring it has replaced GPT-OSS-120B as the poster's daily driver — at just one-third the size.

MoE Efficiency: 35B Parameters, 3B Active

The model employs a Mixture of Experts (MoE) architecture: 35 billion total parameters, but only approximately 3 billion are active during inference. This makes it dramatically cheaper to run than comparable dense models while maintaining high output quality.

Real-World Performance

The original poster shared specific production use cases where the model excels:

Automated message and email triage via N8N with priority-based batching
Agent systems with dynamic tool selection
General-purpose development assistance

The consensus in the thread is that Qwen 3.5-35B-A3B punches well above its weight class, particularly for coding and reasoning tasks.

Qwen 3.5 Family Context

The Qwen 3.5 series is Alibaba's latest open-source model family, available in sizes ranging from compact dense models to larger MoE variants. The 35B-A3B continues a trend of Chinese open-source models closing the gap with — and in some cases surpassing — Western counterparts at comparable parameter counts.

LLM Reddit Apr 16, 2026 2 min read

LocalLLaMA Finds a Practical Speed Trick in Caching Hot MoE Experts in VRAM

LocalLLaMA reacted because the post attacks a very real pain point for running large MoE models on limited VRAM. The author tested a llama.cpp fork that tracks recently routed experts and keeps the hot ones in VRAM for Qwen3.5-122B-A10B, reporting 26.8% faster token generation than layer-based offload at a similar 22GB VRAM budget.

#local-llm #llama-cpp #moe

LLM Reddit Apr 20, 2026 2 min read

Qwen3.6 lit up LocalLLaMA because the agent actually debugged the app

r/LocalLLaMA pushed this past 900 points because it was not another score table. The hook was a local coding agent noticing and fixing its own canvas and wave-completion bugs.

#qwen #local-llm #agents

LLM Reddit Apr 20, 2026 2 min read

Qwen3.6 on an M5 Max Made r/LocalLLaMA Talk About Keeping Code Local

r/LocalLLaMA pushed this post up because the “trust me bro” report had real operating conditions: 8-bit quantization, 64k context, OpenCode, and Android debugging.

#qwen #local-llm #coding-agents