AI sources.twitter 3h ago 2 min read
Why it matters: Cloudflare is attacking the memory-bandwidth bottleneck in LLM serving rather than only buying more GPUs. Its post reports 15-22% model-size reduction, about 3 GB VRAM saved on Llama 3.1 8B, and open-sourced GPU kernels.