13 Months After the DeepSeek Moment: How Far Has Local AI Come?

13 Months of Local AI Progress

In early 2025, a Hugging Face engineer tweeted about how to run the frontier-level DeepSeek R1 model at Q8 quantization at approximately 5 tokens per second — requiring about $6,000 in hardware.

This r/LocalLLaMA post (176 upvotes) provides a striking update: you can now run a significantly more capable model at the same speed on a $600 mini PC. Specifically: Qwen3-27B at Q4 quantization runs at roughly 5 t/s on a $600 AOOSTAR mini PC.

Want More Usable Speeds?

For more practical inference speeds, Qwen3.5-35B-A3B (MoE architecture) at Q4/Q5 quantization runs at 17-20 t/s on comparable hardware. That is a practically useful speed for everyday AI assistance tasks.

Looking Ahead

The author speculates that at this pace, a 4B model better than today's best could be running locally within a year. The trajectory from $6,000 for 5 t/s frontier inference to $600 for better-than-frontier inference in 13 months suggests that genuinely capable local AI on consumer hardware is no longer a distant prospect.

Why This Matters

The democratization of local AI goes beyond cost savings. It enables privacy-first inference without cloud dependencies, makes high-quality AI accessible in regions with limited internet infrastructure, and shifts the balance of power away from cloud AI providers. The speed of this progress is one of the most remarkable dynamics in the current AI landscape.

LLM Reddit Apr 20, 2026 2 min read

Qwen3.6 lit up LocalLLaMA because the agent actually debugged the app

r/LocalLLaMA pushed this past 900 points because it was not another score table. The hook was a local coding agent noticing and fixing its own canvas and wave-completion bugs.

#qwen #local-llm #agents

LLM Reddit Apr 20, 2026 2 min read

Qwen3.6 on an M5 Max Made r/LocalLLaMA Talk About Keeping Code Local

r/LocalLLaMA pushed this post up because the “trust me bro” report had real operating conditions: 8-bit quantization, 64k context, OpenCode, and Android debugging.

#qwen #local-llm #coding-agents

LLM Reddit 4d ago 2 min read

LocalLLaMA Gets a MacBook Air M5 Benchmark for 21 Coding Models, Not Just Vibes

A r/LocalLLaMA benchmark compared 21 local coding models on HumanEval+, speed, and memory, putting Qwen 3.6 35B-A3B on top while surfacing practical RAM and tok/s trade-offs.

#localllama #benchmark #qwen