AI Reddit Feb 22, 2026 1 min read
Startup Taalas is taking a radical approach to AI inference: etching LLM model weights and architecture directly into a silicon chip. Their Llama 3.1 8B demo achieves 16,000 tokens per second — but the approach bets that model architectures won't change.