#taalas

LLM Reddit Feb 23, 2026 1 min read

Taalas Claims to Bake Entire LLMs Into Silicon for 17K Tokens/Second

Startup Taalas proposes baking entire LLM weights and architecture into custom ASICs, claiming 17K+ tokens/second per user, sub-1ms latency, and 20x lower cost than cloud — all achievable within a 60-day chip production cycle.

#taalas #llm #asic

102

LLM Hacker News Feb 22, 2026 2 min read

Taalas Prints LLM Weights into Silicon: 17,000 Tokens/sec at 10x Lower Cost

Taalas has released an ASIC chip that physically etches Llama 3.1 8B model weights into silicon, achieving 17,000 tokens per second—10x faster, 10x cheaper, and 10x more power-efficient than GPU-based inference systems.

#taalas #asic #llm

103

LLM Hacker News Feb 20, 2026 2 min read

Taalas proposes model-specific silicon for low-latency AI inference

A high-engagement Hacker News thread spotlights Taalas’ claim that model-specific silicon can cut inference latency and cost, including a hard-wired Llama 3.1 8B deployment reportedly reaching 17K tokens/sec per user.

#llm #inference #ai-hardware