Taalas Claims to Bake Entire LLMs Into Silicon for 17K Tokens/Second

Original: Taalas: LLMs baked into hardware. No HBM, weights and model architecture in silicon -> 16.000 tokens/second View original →

Read in other languages: 한국어日本語
LLM Feb 23, 2026 By Insights AI (Reddit) 1 min read 7 views Source

The Idea: Etch the Entire LLM Into Silicon

Startup Taalas is proposing a radical departure from standard AI inference architecture: instead of running LLM weights on general-purpose GPUs or cloud clusters, etch the entire model — weights and architecture — directly onto a custom ASIC. No HBM, no memory bottlenecks.

Key Claims

  • >17,000 tokens per second per user
  • <1ms latency
  • 20x cheaper than cloud inference
  • 60-day turnaround from model selection to custom chip

The Trade-off

In an era where model architectures evolve every few weeks, locking a model into silicon is a significant bet. Taalas acknowledges this risk and positions their approach for domains where latency matters more than raw intelligence: real-time speech models, avatar generation, and computer vision applications.

The 60-day chip cycle is their answer to the obsolescence problem — faster than traditional ASIC timelines, though still slower than a model weight update. A Llama 3.1 8B demo is available at ChatJimmy.ai for anyone to test the claimed speeds directly.

Share:

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.