LLM Hacker News Jun 2, 2026 2 min read
The popular thread turned a local-inference stunt into a practical discussion about decoding bottlenecks, power cost, and runtime knobs.
The popular thread turned a local-inference stunt into a practical discussion about decoding bottlenecks, power cost, and runtime knobs.
A March 19, 2026 Hacker News post about Kitten TTS reached 512 points and 172 comments at crawl time. KittenML says its 15M, 40M, and 80M ONNX speech models target CPU inference with eight English voices and 24 kHz output.
Hacker News pushed Microsoft's bitnet.cpp back into view, treating it less as a new 100B checkpoint and more as an infrastructure play for 1.58-bit inference and lower-power local LLM deployment.