NVIDIA Vera Rubin Platform Launches with 75% GPU Reduction for MoE, 10x Inference Cost Cut
Vera Rubin Unveiled at CES 2026
NVIDIA announced its next-generation AI platform Vera Rubin at CES 2026. Rubin is a superchip combining one Vera CPU and two Rubin GPUs in a single processor, serving as the core of the six-chip Rubin platform.
Revolutionary Performance Gains
NVIDIA reports that the Rubin platform delivers the following improvements over Blackwell systems:
- MoE Model Training: 4x reduction in the number of GPUs needed to train the same model (75% decrease)
- Inference Token Costs: 10x reduction
This is particularly optimized for large-scale Mixture-of-Experts (MoE) models like GPT-4, Llama 4 Maverick, and DeepSeek V4.
Targeting Agentic AI and Reasoning Models
NVIDIA framed the Rubin platform as ideal for agentic AI, advanced reasoning models, and MoE models, reflecting the core trends in the AI industry for 2026.
Release Timeline and Partners
The Rubin platform is in full production, and Rubin-based products will be available from partners in the second half of 2026. Major cloud providers (AWS, Google Cloud, Microsoft Azure) and server manufacturers are preparing Rubin-based offerings.
Gaming GPU Hiatus in 2026
Meanwhile, NVIDIA reportedly does not plan to release a new graphics chip for gaming this year, marking the first time in 30 years that the company will skip a full calendar year without a significant GeForce refresh. This is due to a deepening global memory shortage, pushing NVIDIA to prioritize limited memory capacity for AI accelerators.
VibeTensor Open Source Release
NVIDIA also released VibeTensor, a PyTorch-style deep learning runtime whose implementation was generated by LLM coding agents. It is open-sourced under Apache 2.0 license, targeting Linux x86_64 with NVIDIA GPUs and CUDA as hard requirements.
Related Articles
NVIDIAが次世代AIプラットフォームRubinを発表。Blackwell比で推論トークンコスト10倍削減、MoEモデル訓練GPU数4倍削減を達成し、2026年下半期リリース予定。
Googleは2026年10月から2029年6月まで、約110,000基のNVIDIA GPUなどを使うためSpaceXに月$920Mを支払う。Gemini Enterpriseの需要が想定を上回り、巨大インフラ企業でも外部computeを借りる局面に入った。
NAVERはGAK Sejongを55MW規模へ広げ、将来的にはギガワット級AIファクトリーを目指す。NVIDIAの投稿は、DSXを主権AI、HyperCLOVA X、エージェント型サービスの基盤として位置づけた。