NVIDIA Vera Rubin Platform Launches with 75% GPU Reduction for MoE, 10x Inference Cost Cut
Vera Rubin Unveiled at CES 2026
NVIDIA announced its next-generation AI platform Vera Rubin at CES 2026. Rubin is a superchip combining one Vera CPU and two Rubin GPUs in a single processor, serving as the core of the six-chip Rubin platform.
Revolutionary Performance Gains
NVIDIA reports that the Rubin platform delivers the following improvements over Blackwell systems:
- MoE Model Training: 4x reduction in the number of GPUs needed to train the same model (75% decrease)
- Inference Token Costs: 10x reduction
This is particularly optimized for large-scale Mixture-of-Experts (MoE) models like GPT-4, Llama 4 Maverick, and DeepSeek V4.
Targeting Agentic AI and Reasoning Models
NVIDIA framed the Rubin platform as ideal for agentic AI, advanced reasoning models, and MoE models, reflecting the core trends in the AI industry for 2026.
Release Timeline and Partners
The Rubin platform is in full production, and Rubin-based products will be available from partners in the second half of 2026. Major cloud providers (AWS, Google Cloud, Microsoft Azure) and server manufacturers are preparing Rubin-based offerings.
Gaming GPU Hiatus in 2026
Meanwhile, NVIDIA reportedly does not plan to release a new graphics chip for gaming this year, marking the first time in 30 years that the company will skip a full calendar year without a significant GeForce refresh. This is due to a deepening global memory shortage, pushing NVIDIA to prioritize limited memory capacity for AI accelerators.
VibeTensor Open Source Release
NVIDIA also released VibeTensor, a PyTorch-style deep learning runtime whose implementation was generated by LLM coding agents. It is open-sourced under Apache 2.0 license, targeting Linux x86_64 with NVIDIA GPUs and CUDA as hard requirements.
Related Articles
NVIDIA가 차세대 AI 플랫폼 Rubin을 발표했다. Blackwell 대비 추론 토큰 비용 10배 절감, MoE 모델 훈련 GPU 수 4배 감소를 달성하며 2026년 하반기 출시 예정이다.
Google이 2026년 10월부터 2029년 6월까지 SpaceX에 월 $920M을 내고 약 110,000개 NVIDIA GPU와 관련 컴퓨팅 자원을 쓰기로 했다. Gemini Enterprise 수요가 예상보다 커지면서, 자체 인프라 강자인 Google도 외부 AI compute를 단기 조달한다.
NAVER가 GAK 세종을 55MW 규모로 확장하고 장기적으로 기가와트급 AI 팩토리를 추진한다. NVIDIA Newsroom 게시물은 DSX 기반 주권 AI 인프라와 HyperCLOVA X 고도화를 핵심 축으로 제시했다.