#cpu-inference

LLM Hacker News Jul 16, 2026 2 min read

Gemma 4 26B runs at 5 tok/s on a 13-year-old Xeon

The HN debate was not just “old hardware still works.” A patched ik_llama.cpp path got Gemma 4 26B-A4B running CPU-only on dual Ivy Bridge Xeons, raising practical questions about local inference cost, control, and fallback capacity.

#gemma #cpu-inference #llama-cpp

LLM Hacker News Jun 2, 2026 2 min read

A 2016 Xeon Runs Gemma 4, but the Real Story Is Memory Bandwidth

The popular thread turned a local-inference stunt into a practical discussion about decoding bottlenecks, power cost, and runtime knobs.

#local-ai #gemma #cpu-inference

AI Hacker News Mar 20, 2026 2 min read

Hacker News Spots Kitten TTS Pushing 25 MB-to-80 MB CPU-First Speech Models

A March 19, 2026 Hacker News post about Kitten TTS reached 512 points and 172 comments at crawl time. KittenML says its 15M, 40M, and 80M ONNX speech models target CPU inference with eight English voices and 24 kHz output.

#text-to-speech #edge-ai #onnx

LLM Hacker News Mar 11, 2026 2 min read

Hacker News Highlights BitNet's Bid for 100B-Class 1-Bit Inference on One CPU

Hacker News pushed Microsoft's bitnet.cpp back into view, treating it less as a new 100B checkpoint and more as an infrastructure play for 1.58-bit inference and lower-power local LLM deployment.

#bitnet #local-llm #cpu-inference