Qwen3.6 on an M5 Max Made r/LocalLLaMA Talk About Keeping Code Local

This r/LocalLLaMA post was closer to a field report than a benchmark, which is why it landed. The author said they were running Qwen3.6-35B-A3B with 8-bit quantization and a 64k context window through OpenCode on a MacBook Pro M5 Max with 128GB of memory. They also admitted it was a “trust me bro” post, but the details gave the thread something concrete to test against.

The workload was not a toy prompt. The author said the model handled long research tasks with many tool calls, including investigating why R8 was breaking serialization across an Android app. They described fast responses, useful answers, and enough confidence to consider it a daily driver after using Kimi k2.5 through OpenCode zen. The line that carried the community energy was about not sending an entire codebase to random providers and hoping the trust model holds.

The comments immediately added useful friction. One user said that on an RTX 5090, the speed made the overall experience feel unmatched by cloud models. Another argued that context is cheap on Qwen and that 256k is reachable. Others pushed back: it may be quite good, but not actually Claude; and 64k context may be low for agentic coding once a tool loop starts accumulating state.

community discussion noted that the real signal is not a formal win over closed models. It is a threshold signal. Local inference has often been framed as possible but inconvenient. Posts like this suggest that, for some coding workflows, a 30B to 40B-class sparse model on high-memory consumer hardware can feel operational enough to change where developers are willing to run agents.

The caveat is the story. Hardware, quantization, KV cache settings, context length, editor integration, and task shape all matter. The thread’s value is not one claim of parity. It is a practical checklist for evaluating local coding agents: privacy, latency, context cost, tool-call stability, and whether the model can stay useful across real project state.

Qwen3.6 on an M5 Max Made r/LocalLLaMA Talk About Keeping Code Local

Related Articles

LocalLLaMA Wants Qwen3.5-9B Quant Choices Backed by KLD, Not Vibes

675 comments later, LocalLLaMA is still arguing about whether local coding LLMs are worth it

110 tok/s on a 35B Model with 12GB VRAM Using ik_llama.cpp

Related Articles

LocalLLaMA Wants Qwen3.5-9B Quant Choices Backed by KLD, Not Vibes
LLM Reddit Apr 16, 2026 1 min read

675 comments later, LocalLLaMA is still arguing about whether local coding LLMs are worth it
LLM Reddit Apr 29, 2026 2 min read

110 tok/s on a 35B Model with 12GB VRAM Using ik_llama.cpp
LLM Reddit May 22, 2026 1 min read