NVIDIA Vera Rubin NVL72: 5x Blackwell Performance and 10x Lower Inference Cost
Vera Rubin NVL72 Overview
NVIDIA has provided a detailed first look at Vera Rubin NVL72, its next-generation AI computing platform. The rack-scale system houses 72 Rubin GPUs and 36 Vera CPUs, built on NVIDIA's third-generation MGX design. Shipment is targeted for H2 2026.
Core Performance Specs
- Inference: 50 PFLOPS per GPU (NVFP4) — 5x Blackwell GB200
- Training: 35 PFLOPS per GPU (NVFP4)
- Memory bandwidth: 22 TB/s (HBM4) — 2.8x Blackwell
- Scale-up connectivity: 260 TB/s with low latency
- Transistors: 336 billion per Rubin GPU — 1.6x Blackwell
Cost Efficiency
Vera Rubin reduces GPUs needed for MoE model training by 4x and cuts inference cost per million tokens by 10x versus Blackwell. NVIDIA's first 100% liquid-cooled platform delivers 10x performance per watt improvement despite consuming about twice the power of Blackwell.
Availability
AWS, Google Cloud, Microsoft Azure, OCI, CoreWeave, Lambda, Nebius, and Nscale will offer Vera Rubin instances starting H2 2026. The system comprises 1.3 million components.
Source: CNBC
Related Articles
NVIDIA announced a multigenerational strategic partnership with Meta on February 17, covering millions of Blackwell and Rubin GPUs, the first large-scale Grace CPU deployment, and WhatsApp privacy computing via NVIDIA Confidential Computing.
NVIDIA unveiled the N1 and N1X on February 23, its first consumer SoC combining Arm CPUs with Blackwell GPU architecture for AI PCs. Dell, HP, and Lenovo laptops are expected in spring 2026, marking NVIDIA's bold entry into the PC processor market.
NVIDIA CEO Jensen Huang announced on February 19 that GTC 2026 (March 16–19, San Jose) will feature a surprise chip reveal, fueling speculation about new hardware beyond the Rubin platform.
Comments (0)
No comments yet. Be the first to comment!