#fp8

LLM X/Twitter Apr 28, 2026 2 min read

vLLM lifts FP8 long-context accuracy from 13% to 89%

Why it matters: FP8 inference only pays off if the accuracy collapse is fixable. vLLM says a two-level accumulation change lifted 128k needle-in-a-haystack accuracy from 13% to 89% while preserving FP8 decode speed.

#vllm #fp8 #inference

LLM X/Twitter Apr 22, 2026 1 min read

NVIDIA NeMo RL uses FP8 to speed Qwen3-8B training by 1.48x

Why it matters: post-training agents increasingly depend on reinforcement learning throughput, not only inference speed. NVIDIA says NeMo RL’s FP8 path speeds RL workloads by 1.48x on Qwen3-8B-Base while tracking BF16 accuracy.

#nvidia #nemo-rl #fp8