LMSYS posts Day-0 DeepSeek-V4 speeds up to 266 tok/s on H200

Why this tweet mattered more than another benchmark brag

LMSYS published the kind of systems post that determines whether a model is usable on day one. The organization’s account wrote that it had shipped Day-0 support for DeepSeek-V4 through SGLang and Miles, and it attached concrete throughput numbers rather than broad superlatives. The tweet claims 199 tok/s on B200 for the 1.6T Pro model and 266 tok/s on H200 for the 284B Flash model at 4K context, while saying throughput still holds up at 900K context with only modest drop-off.

“199 tok/s on B200… 266 tok/s on H200… throughput stays strong at 900K context.”

LMSYS is not a random hype account. It is one of the more closely watched accounts for model evaluation and serving-stack work, and its linked blog is a dense technical breakdown rather than a short promo page. The article, dated April 25, 2026, describes Day-0 inference and RL support for DeepSeek-V4 and spells out what had to be built around hybrid sparse attention, manifold-constrained hyper-connections, and FP4 expert weights. It also notes a 1M-token context window and claims up to 3x throughput improvement for long-context serving via HiSparse.

Why systems support is the real product here

Benchmark headlines often flatten the hard part of model launches. A new checkpoint matters only if inference kernels, cache strategies, expert routing, and training stacks can keep up. LMSYS’s post is interesting because it treats DeepSeek-V4 as a deployment problem, not just a model artifact. The linked write-up also claims that a fused compression path can reach up to 80% of peak memory bandwidth on H200 and run more than 10x faster than a naive PyTorch pipeline in that stage.

What to watch next is whether other open-source serving stacks reproduce these numbers and whether launch-day support turns into stable support once real user traffic arrives. If the LMSYS figures hold, DeepSeek-V4 will not only be notable for its open weights but also for how quickly the surrounding software stack caught up. Source: LMSYS source tweet · LMSYS technical blog

LMSYS posts Day-0 DeepSeek-V4 speeds up to 266 tok/s on H200

Why this tweet mattered more than another benchmark brag

Why systems support is the real product here

Related Articles

NVIDIA Inference Hub gives engineers one API for 100-plus AI models

Etched puts working silicon and $1B in contracts behind inference ASICs

Together AI’s $800M round turns open-model inference into a scale race