Taalas Claims to Bake Entire LLMs Into Silicon for 17K Tokens/Second
Original: Taalas: LLMs baked into hardware. No HBM, weights and model architecture in silicon -> 16.000 tokens/second View original →
The Idea: Etch the Entire LLM Into Silicon
Startup Taalas is proposing a radical departure from standard AI inference architecture: instead of running LLM weights on general-purpose GPUs or cloud clusters, etch the entire model — weights and architecture — directly onto a custom ASIC. No HBM, no memory bottlenecks.
Key Claims
- >17,000 tokens per second per user
- <1ms latency
- 20x cheaper than cloud inference
- 60-day turnaround from model selection to custom chip
The Trade-off
In an era where model architectures evolve every few weeks, locking a model into silicon is a significant bet. Taalas acknowledges this risk and positions their approach for domains where latency matters more than raw intelligence: real-time speech models, avatar generation, and computer vision applications.
The 60-day chip cycle is their answer to the obsolescence problem — faster than traditional ASIC timelines, though still slower than a model weight update. A Llama 3.1 8B demo is available at ChatJimmy.ai for anyone to test the claimed speeds directly.
Related Articles
Taalas has released an ASIC chip that physically etches Llama 3.1 8B model weights into silicon, achieving 17,000 tokens per second—10x faster, 10x cheaper, and 10x more power-efficient than GPU-based inference systems.
A high-engagement Hacker News thread spotlights Taalas’ claim that model-specific silicon can cut inference latency and cost, including a hard-wired Llama 3.1 8B deployment reportedly reaching 17K tokens/sec per user.
Andrej Karpathy highlights the fundamental memory+compute trade-off challenge in LLMs: fast but small on-chip SRAM versus large but slow off-chip DRAM. He calls optimizing this the most intellectually rewarding puzzle in AI infrastructure today, pointing to NVIDIA's $4.6T market cap as proof.
Comments (0)
No comments yet. Be the first to comment!