#linear-attention

LLM X/Twitter Apr 30, 2026 2 min read

Qwen's FlashQLA lifts linear attention speed 2-3x on Hopper

Why it matters: kernel work is what decides whether long-context and edge-side agent systems stay theoretical or become cheap enough to run. Qwen says FlashQLA delivers 2-3x forward speedup and 2x backward speedup over the FLA Triton kernel on NVIDIA Hopper.

#qwen #linear-attention #kernels

LLM Reddit Apr 29, 2026 2 min read

LocalLLaMA liked the FlashQLA jokes, but the real hook was the numbers

The top comment went straight to the CP joke, but the post held because the technical claim was concrete: 2-3x forward speedups and 2x backward speedups for GDN chunked prefill, aimed at long-context and edge-side agentic inference.

#qwen #flashqla #linear-attention