vLLM, 첫 MLPerf vision-language benchmark 제출 사례에 이름 올리다

Original: We're proud to share that @NVIDIA submitted the first-ever MLPerf Vision Language Model (VLM) performance benchmark using vLLM. This achievement showcases the strength of our ongoing collaboration with NVIDIA Engineering. Check out their MLPerf blog and watch our On Demand Talk at GTC to learn more about how we are delivering the best performance on NVIDIA hardware. 🔗 Blog: http://developer.nvidia.com/blog/nvidia-platform-delivers-lowest-token-cost-enabled-by-extreme-co-design/ 🔗 Talk: http://nvidia.com/en-us/on-demand/session/gtc26-s82059/ View original →

Read in other languages: English 日本語

LLM Apr 10, 2026 By Insights AI 1 min read 20 views Source

vLLM 프로젝트는 2026년 4월 9일 X post에서 NVIDIA가 vLLM을 사용해 첫 MLPerf Vision Language Model benchmark를 제출했다고 밝혔다. 함께 링크된 NVIDIA Technical Blog에 따르면 Qwen3-VL-235B-A22B는 MLPerf Inference suite에 새로 추가된 첫 multimodal model이며, v6.0에서 offline과 server 시나리오가 함께 측정됐다. NVIDIA는 이 항목에서 offline 79 samples/sec, server 68 queries/sec를 제시했다.

다만 이 발표를 vLLM 단독 성과로 읽는 것은 과장이다. NVIDIA blog는 이번 결과를 더 큰 Blackwell Ultra 최적화 이야기 안에 배치하며, hardware와 open-source software의 공동 최적화로 일부 workload에서 최대 2.7배 throughput과 60% 이상 낮은 token cost를 달성했다고 설명한다. 그럼에도 중요한 지점은 attribution이다. NVIDIA는 Qwen3-VL benchmark 제출에 vLLM을 썼다고 명시했고, 다른 신규 benchmark는 TensorRT-LLM VisualGen 같은 별도 도구를 사용했다고 구분했다.

이 차이는 open-source serving ecosystem에 의미가 있다. MLPerf는 여전히 operator와 model-serving 팀이 강하게 의식하는 benchmark이고, vLLM이 첫 multimodal track의 일부가 됐다는 것은 프로젝트의 역할이 text-only serving을 넘어 image-heavy inference까지 넓어지고 있음을 뜻한다. 특정 stack이 모든 deployment에서 최고라는 증거는 아니지만, top-tier multimodal benchmark의 중심부에 open-source framework가 들어왔다는 사실 자체가 신호다.

LLM Reddit Mar 16, 2026 2 min read

LocalLLaMA 벤치마크: RTX PRO 6000 SM120의 병목은 깨진 CUTLASS NVFP4 MoE 커널

2026년 3월 12일 LocalLLaMA 게시글은 4x RTX PRO 6000 Blackwell 환경에서 Qwen3.5-397B NVFP4의 지속 decode 최고값이 Marlin 기준 50.5 tok/s라고 주장했다. 이유는 SM120에서 CUTLASS grouped GEMM 경로가 실패하거나 느린 fallback으로 떨어지기 때문이라는 설명이다.

#qwen #blackwell #vllm

LLM X/Twitter 9h ago 1 min read

Claude Fable 5, GDPval-AA 1932점으로 에이전트 업무 벤치마크 선두

Claude Fable 5가 GDPval-AA 1932점으로 에이전트형 지식 업무 벤치마크 1위에 올랐다. Anthropic 모델이 상위 4개 중 3개를 차지했다는 점은 장시간 업무형 모델 경쟁이 성능표 중심으로 재편되고 있음을 보여준다.

#anthropic #claude #benchmark

LLM Hacker News 1d ago 1 min read

FrontierCode, “테스트 통과”보다 “merge할 코드인가”를 묻는 평가

코딩 모델 평가가 정답률에서 코드 리뷰 품질로 옮겨가고 있다는 점에 HN 관심이 모였다. FrontierCode는 PR을 실제 maintainer가 받아들일지에 초점을 둔다.

#coding-agents #benchmark #evals

Related Articles

LocalLLaMA 벤치마크: RTX PRO 6000 SM120의 병목은 깨진 CUTLASS NVFP4 MoE 커널

Claude Fable 5, GDPval-AA 1932점으로 에이전트 업무 벤치마크 선두

FrontierCode, “테스트 통과”보다 “merge할 코드인가”를 묻는 평가