LocalLLaMA Tests Qwen3.5-35B-A3B for Agentic Coding, Reports Triple-Digit Token Speeds

Original: Qwen3.5-35B-A3B is a gamechanger for agentic coding. View original →

Read in other languages: 한국어日本語
LLM Feb 26, 2026 By Insights AI (Reddit) 2 min read 3 views Source

What the Community Post Claimed

A top LocalLLaMA thread reported aggressive local coding performance from Qwen3.5-35B-A3B. The author described running llama.cpp on a headless Linux box with a single RTX 3090, using an MXFP4 model build and a long context configuration, while citing roughly 22 GB of VRAM usage.

The poster shared concrete launch settings and claimed two practical outcomes: sustained throughput above 100 tokens per second and successful completion of a personal coding evaluation task that had historically taken human candidates several hours. They also described a quick recreation task in an agentic workflow, positioning the model as unusually strong for local open-weight coding use.

Why the Thread Drew Attention

  • It combined reproducible setup details with claimed real task outcomes
  • It focused on local hardware economics rather than cloud API performance
  • It framed results around agent tool usage, not only static benchmark scores

Commenters added a wider evidence set. Some reported similarly high throughput on newer consumer/workstation GPUs. Others saw weaker tool-use behavior despite good code reading quality. Several practitioners highlighted that agent results depend heavily on surrounding system choices: quantization format, framework implementation, number of tools in the schema, and context-management strategy.

How to Read These Results

This is still community evidence, not a controlled benchmark paper. But it is useful evidence because the thread exposes conditions under which local coding models either perform surprisingly well or degrade quickly. The practical message is not simply “this model is fastest,” but that end-to-end agent design now determines whether local LLMs can replace portions of API-first coding loops.

For teams evaluating local deployment, this thread is a reminder to test entire pipelines: model + quant + runtime + tool schema + workload. Qwen3.5-35B-A3B appears capable of strong coding output in tuned environments, yet variance across real setups remains high enough that production decisions should be validated with internal workloads before broad rollout.

Source thread: r/LocalLLaMA discussion
Related model page: Hugging Face - Qwen3.5-35B-A3B

Share:

Related Articles

LLM Reddit 6d ago 2 min read

A well-received PSA on r/LocalLLaMA argues that convenience layers such as Ollama and LM Studio can change model behavior enough to distort evaluation. The more durable lesson from the thread is reproducibility: hold templates, stop tokens, sampling, runtime versions, and quantization constant before judging a model.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.