NVIDIA Launches Rubin Platform: 10x Lower Inference Cost, 4x Fewer Training GPUs
Rubin Platform Ships H2 2026
NVIDIA announced its next-generation AI platform Rubin. Rubin-based products will be available from partners in the second half of 2026 and are currently in full production.
Dramatic Performance Gains Over Blackwell
The Rubin platform achieves the following through extreme codesign across hardware and software:
- 10x reduction in inference token cost: Dramatically lower inference costs vs. Blackwell
- 4x reduction in GPUs for MoE training: Trains Mixture-of-Experts models with 1/4 the GPU count
- Six new chips: Includes Rubin GPU, Grace CPU, and networking chips
Key Cloud Partners
First cloud providers to deploy Vera Rubin-based instances in 2026:
- Major clouds: AWS, Google Cloud, Microsoft, OCI
- NVIDIA Cloud Partners: CoreWeave, Lambda, Nebius, Nscale
- Server vendors: Cisco, Dell, HPE, Lenovo, Supermicro
Consumer GPUs Skipped in 2026
Meanwhile, NVIDIA will reportedly skip gaming GPU releases in 2026. The RTX 50 Super and RTX 60 series are delayed due to memory shortages and profit margin differences.
AI chips offer 65% profit margins vs. 40% for graphics cards, driving NVIDIA's strategic shift toward AI production.
Strengthening AI Infrastructure Dominance
The Rubin platform launch signals NVIDIA's continued dominance in AI infrastructure beyond 2026. The inference cost reduction will be particularly game-changing for LLM service providers.
Source: NVIDIA Newsroom, TrendForce
Related Articles
NVIDIA unveiled its next-generation AI platform Vera Rubin at CES 2026, reducing GPUs needed for MoE model training by 4x and slashing inference token costs by 10x, with availability in H2 2026.
NVIDIA announced the Rubin platform at CES 2026 in January. Comprising six new chips, the Vera Rubin superchip delivers 5x improved inference performance over GB200. Major AI companies including OpenAI, Meta, and Microsoft plan to adopt it.
NVIDIA outlined a Rubin-based DGX SuperPOD architecture that combines compute, networking, and operations software as one deployment stack. The company claims up to 10x lower inference token cost versus the prior generation and targets availability in the second half of 2026.
Comments (0)
No comments yet. Be the first to comment!