LLM Reddit Mar 1, 2026 2 min read
r/LocalLLaMA Benchmarks: <code>Krasis</code> reports 3,324 tok/s prefill for 80B MoE on one RTX 5080
A r/LocalLLaMA post (score 180, 53 comments) shared benchmark data for <code>Krasis</code>, a hybrid CPU/GPU runtime aimed at large MoE models. The key claim is that GPU-heavy prefill plus CPU decode can reduce long-context waiting time even when full models do not fit in consumer VRAM.