NVIDIA Vera Rubin Platform Launches with 75% GPU Reduction for MoE, 10x Inference Cost Cut
Vera Rubin Unveiled at CES 2026
NVIDIA announced its next-generation AI platform Vera Rubin at CES 2026. Rubin is a superchip combining one Vera CPU and two Rubin GPUs in a single processor, serving as the core of the six-chip Rubin platform.
Revolutionary Performance Gains
NVIDIA reports that the Rubin platform delivers the following improvements over Blackwell systems:
- MoE Model Training: 4x reduction in the number of GPUs needed to train the same model (75% decrease)
- Inference Token Costs: 10x reduction
This is particularly optimized for large-scale Mixture-of-Experts (MoE) models like GPT-4, Llama 4 Maverick, and DeepSeek V4.
Targeting Agentic AI and Reasoning Models
NVIDIA framed the Rubin platform as ideal for agentic AI, advanced reasoning models, and MoE models, reflecting the core trends in the AI industry for 2026.
Release Timeline and Partners
The Rubin platform is in full production, and Rubin-based products will be available from partners in the second half of 2026. Major cloud providers (AWS, Google Cloud, Microsoft Azure) and server manufacturers are preparing Rubin-based offerings.
Gaming GPU Hiatus in 2026
Meanwhile, NVIDIA reportedly does not plan to release a new graphics chip for gaming this year, marking the first time in 30 years that the company will skip a full calendar year without a significant GeForce refresh. This is due to a deepening global memory shortage, pushing NVIDIA to prioritize limited memory capacity for AI accelerators.
VibeTensor Open Source Release
NVIDIA also released VibeTensor, a PyTorch-style deep learning runtime whose implementation was generated by LLM coding agents. It is open-sourced under Apache 2.0 license, targeting Linux x86_64 with NVIDIA GPUs and CUDA as hard requirements.
Related Articles
NVIDIA unveiled its next-gen AI platform Rubin, delivering 10x reduction in inference token cost and 4x fewer GPUs for MoE model training vs. Blackwell. Launch planned for H2 2026.
NVIDIA announced the Rubin platform at CES 2026 in January. Comprising six new chips, the Vera Rubin superchip delivers 5x improved inference performance over GB200. Major AI companies including OpenAI, Meta, and Microsoft plan to adopt it.
NVIDIA outlined a Rubin-based DGX SuperPOD architecture that combines compute, networking, and operations software as one deployment stack. The company claims up to 10x lower inference token cost versus the prior generation and targets availability in the second half of 2026.
Comments (0)
No comments yet. Be the first to comment!