LocalLLaMA Tests Qwen3.5-35B-A3B for Agentic Coding, Reports Triple-Digit Token Speeds

What the Community Post Claimed

A top LocalLLaMA thread reported aggressive local coding performance from Qwen3.5-35B-A3B. The author described running llama.cpp on a headless Linux box with a single RTX 3090, using an MXFP4 model build and a long context configuration, while citing roughly 22 GB of VRAM usage.

The poster shared concrete launch settings and claimed two practical outcomes: sustained throughput above 100 tokens per second and successful completion of a personal coding evaluation task that had historically taken human candidates several hours. They also described a quick recreation task in an agentic workflow, positioning the model as unusually strong for local open-weight coding use.

Why the Thread Drew Attention

It combined reproducible setup details with claimed real task outcomes
It focused on local hardware economics rather than cloud API performance
It framed results around agent tool usage, not only static benchmark scores

Commenters added a wider evidence set. Some reported similarly high throughput on newer consumer/workstation GPUs. Others saw weaker tool-use behavior despite good code reading quality. Several practitioners highlighted that agent results depend heavily on surrounding system choices: quantization format, framework implementation, number of tools in the schema, and context-management strategy.

How to Read These Results

This is still community evidence, not a controlled benchmark paper. But it is useful evidence because the thread exposes conditions under which local coding models either perform surprisingly well or degrade quickly. The practical message is not simply “this model is fastest,” but that end-to-end agent design now determines whether local LLMs can replace portions of API-first coding loops.

For teams evaluating local deployment, this thread is a reminder to test entire pipelines: model + quant + runtime + tool schema + workload. Qwen3.5-35B-A3B appears capable of strong coding output in tuned environments, yet variance across real setups remains high enough that production decisions should be validated with internal workloads before broad rollout.

Source thread: r/LocalLLaMA discussion
Related model page: Hugging Face - Qwen3.5-35B-A3B

LocalLLaMA Tests Qwen3.5-35B-A3B for Agentic Coding, Reports Triple-Digit Token Speeds

What the Community Post Claimed

Why the Thread Drew Attention

How to Read These Results

Related Articles

llama.cpp --fit made LocalLLaMA rethink the VRAM wall

r/LocalLLaMA Focuses on a Qwen3.5-27B + llama.cpp + OpenCode Stack That Actually Works

r/LocalLLaMA argues Qwen3.5 27B is where local speed, quality, and hardware practicality meet

Comments (0)

Leave a Comment

Related Articles

llama.cpp --fit made LocalLLaMA rethink the VRAM wall

r/LocalLLaMA Focuses on a Qwen3.5-27B + llama.cpp + OpenCode Stack That Actually Works
LLM Reddit Mar 30, 2026 2 min read

r/LocalLLaMA argues Qwen3.5 27B is where local speed, quality, and hardware practicality meet
LLM Reddit Apr 8, 2026 2 min read