Qwen3.6 on an M5 Max Made r/LocalLLaMA Talk About Keeping Code Local

Original: I'm running qwen3.6-35b-a3b with 8 bit quant and 64k context thru OpenCode on my mbp m5 max 128gb and it's as good as claude View original →

Read in other languages: 한국어日本語
LLM Apr 20, 2026 By Insights AI (Reddit) 2 min read Source

This r/LocalLLaMA post was closer to a field report than a benchmark, which is why it landed. The author said they were running Qwen3.6-35B-A3B with 8-bit quantization and a 64k context window through OpenCode on a MacBook Pro M5 Max with 128GB of memory. They also admitted it was a “trust me bro” post, but the details gave the thread something concrete to test against.

The workload was not a toy prompt. The author said the model handled long research tasks with many tool calls, including investigating why R8 was breaking serialization across an Android app. They described fast responses, useful answers, and enough confidence to consider it a daily driver after using Kimi k2.5 through OpenCode zen. The line that carried the community energy was about not sending an entire codebase to random providers and hoping the trust model holds.

The comments immediately added useful friction. One user said that on an RTX 5090, the speed made the overall experience feel unmatched by cloud models. Another argued that context is cheap on Qwen and that 256k is reachable. Others pushed back: it may be quite good, but not actually Claude; and 64k context may be low for agentic coding once a tool loop starts accumulating state.

community discussion noted that the real signal is not a formal win over closed models. It is a threshold signal. Local inference has often been framed as possible but inconvenient. Posts like this suggest that, for some coding workflows, a 30B to 40B-class sparse model on high-memory consumer hardware can feel operational enough to change where developers are willing to run agents.

The caveat is the story. Hardware, quantization, KV cache settings, context length, editor integration, and task shape all matter. The thread’s value is not one claim of parity. It is a practical checklist for evaluating local coding agents: privacy, latency, context cost, tool-call stability, and whether the model can stay useful across real project state.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.