LLM X/Twitter Apr 30, 2026 2 min read
Why it matters: kernel work is what decides whether long-context and edge-side agent systems stay theoretical or become cheap enough to run. Qwen says FlashQLA delivers 2-3x forward speedup and 2x backward speedup over the FLA Triton kernel on NVIDIA Hopper.