vLLM Lands in the First MLPerf Vision-Language Benchmark Submission

Original: We're proud to share that @NVIDIA submitted the first-ever MLPerf Vision Language Model (VLM) performance benchmark using vLLM. This achievement showcases the strength of our ongoing collaboration with NVIDIA Engineering. Check out their MLPerf blog and watch our On Demand Talk at GTC to learn more about how we are delivering the best performance on NVIDIA hardware. 🔗 Blog: http://developer.nvidia.com/blog/nvidia-platform-delivers-lowest-token-cost-enabled-by-extreme-co-design/ 🔗 Talk: http://nvidia.com/en-us/on-demand/session/gtc26-s82059/ View original →

Read in other languages: 한국어日本語
LLM Apr 10, 2026 By Insights AI 1 min read Source

In an April 9 X post, the vLLM project said NVIDIA submitted the first-ever MLPerf Vision Language Model benchmark using vLLM. The linked NVIDIA Technical Blog says the Qwen3-VL-235B-A22B test is the first multimodal model added to the MLPerf Inference suite, with offline and server scenarios included in v6.0. NVIDIA reported 79 samples per second in offline mode and 68 queries per second in server mode for that benchmark entry.

The broader NVIDIA post is not a vLLM-only announcement. It positions the VLM result inside a larger Blackwell Ultra performance story, saying continuous co-optimization across hardware and open-source software produced up to 2.7x throughput gains and more than 60% lower cost per token on the same infrastructure for some workloads. But the ecosystem detail that matters is the attribution: NVIDIA says the Qwen3-VL submission used the vLLM framework, while other newly added benchmarks relied on separate tools such as TensorRT-LLM VisualGen.

That matters because MLPerf still has outsized signaling power for operators and model-serving teams. If vLLM is now part of the first multimodal track in the suite, the project’s role is widening beyond text-only serving into image-heavy and mixed-modality inference. The result does not prove that one stack wins every deployment, but it does show that open-source serving frameworks are no longer peripheral to top-tier multimodal benchmarking. They are now part of the benchmark headline itself.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.