LMSYS posts Day-0 DeepSeek-V4 speeds up to 266 tok/s on H200
Original: LMSYS published Day-0 DeepSeek-V4 inference and RL support results View original →
Why this tweet mattered more than another benchmark brag
LMSYS published the kind of systems post that determines whether a model is usable on day one. The organization’s account wrote that it had shipped Day-0 support for DeepSeek-V4 through SGLang and Miles, and it attached concrete throughput numbers rather than broad superlatives. The tweet claims 199 tok/s on B200 for the 1.6T Pro model and 266 tok/s on H200 for the 284B Flash model at 4K context, while saying throughput still holds up at 900K context with only modest drop-off.
“199 tok/s on B200… 266 tok/s on H200… throughput stays strong at 900K context.”
LMSYS is not a random hype account. It is one of the more closely watched accounts for model evaluation and serving-stack work, and its linked blog is a dense technical breakdown rather than a short promo page. The article, dated April 25, 2026, describes Day-0 inference and RL support for DeepSeek-V4 and spells out what had to be built around hybrid sparse attention, manifold-constrained hyper-connections, and FP4 expert weights. It also notes a 1M-token context window and claims up to 3x throughput improvement for long-context serving via HiSparse.
Why systems support is the real product here
Benchmark headlines often flatten the hard part of model launches. A new checkpoint matters only if inference kernels, cache strategies, expert routing, and training stacks can keep up. LMSYS’s post is interesting because it treats DeepSeek-V4 as a deployment problem, not just a model artifact. The linked write-up also claims that a fused compression path can reach up to 80% of peak memory bandwidth on H200 and run more than 10x faster than a naive PyTorch pipeline in that stage.
What to watch next is whether other open-source serving stacks reproduce these numbers and whether launch-day support turns into stable support once real user traffic arrives. If the LMSYS figures hold, DeepSeek-V4 will not only be notable for its open weights but also for how quickly the surrounding software stack caught up. Source: LMSYS source tweet · LMSYS technical blog
Related Articles
Bloomberg reports DeepSeek is pushing forward with a $10.29 billion financing round. Founder Liang Wenfeng publicly reaffirmed commitment to open-source AI development and AGI over short-term commercialization.
The interesting move is not another chatbot surface. Mistral is packaging physics AI for Airbus, BMW, and ASML with a Q3 2026 10MW inference facility in Les Ulis, shifting its enterprise pitch toward controlled industrial deployment.
Perplexity is replacing serial search calls with generated Python that composes retrieval primitives inside agent harnesses. In one CVE advisory case study, it says token use fell 85.1%, from 288.7K to 42.9K.