LLM Reddit 3h ago 2 min read
The top comment went straight to the CP joke, but the post held because the technical claim was concrete: 2-3x forward speedups and 2x backward speedups for GDN chunked prefill, aimed at long-context and edge-side agentic inference.