HN Sees Qwen3.6-35B-A3B as a Small Active-Parameter Bet for Coding Agents
Original: Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to All View original →
The HN thread around Qwen3.6-35B-A3B had a different pulse from a normal model-release post. The headline number was not only 35B total parameters. It was the sparse MoE shape: Qwen describes the model as 35B total with roughly 3B active parameters, released as open weights through Hugging Face and ModelScope, available in Qwen Studio, and planned for API access as Qwen3.6-Flash.
That matters because the community is still hunting for models that can sit inside real coding-agent loops without demanding frontier-scale serving budgets. Qwen's own table puts Qwen3.6-35B-A3B at 73.4 on SWE-bench Verified, 51.5 on Terminal-Bench 2.0, 37.0 on MCPMark, and 1397 Elo on QwenWebBench. The exact harness choices still deserve scrutiny, but HN users quickly read the release as another sign that smaller active-parameter MoE models are becoming credible for software work.
The discussion also showed what open-weight adoption now looks like in practice. One early comment pointed to an Unsloth GGUF conversion almost immediately. Others asked about fitting useful context on a 36GB Mac, whether the missing 9B or 27B variants would matter more to local users, and how to interpret benchmark tables that compare mostly against open models rather than the proprietary systems many developers actually use every day.
The most useful signal is not that Qwen posted another score table. It is that HN treated the release as infrastructure: can it be quantized, run locally, fit inside a constrained workstation, and survive agentic coding tasks where tool use and long context matter? That is the community test open models now have to pass.
Related Articles
LocalLLaMA reacted because the post attacks a very real pain point for running large MoE models on limited VRAM. The author tested a llama.cpp fork that tracks recently routed experts and keeps the hot ones in VRAM for Qwen3.5-122B-A10B, reporting 26.8% faster token generation than layer-based offload at a similar 22GB VRAM budget.
A LocalLLaMA release post presents OmniCoder-9B as a Qwen3.5-9B-based coding agent fine-tuned on 425,000-plus agentic trajectories, with commenters focusing on its read-before-write behavior and usefulness at small model size.
A high-engagement r/LocalLLaMA post surfaced the Qwen3.5-35B-A3B model card on Hugging Face. The card emphasizes MoE efficiency, long context handling, and deployment paths across common open-source inference stacks.
Comments (0)
No comments yet. Be the first to comment!