NVIDIA Vera Rubin NVL72: 5x Blackwell Performance and 10x Lower Inference Cost

Read in other languages: 한국어日本語
AI Mar 1, 2026 By Insights AI 1 min read 5 views Source

Vera Rubin NVL72 Overview

NVIDIA has provided a detailed first look at Vera Rubin NVL72, its next-generation AI computing platform. The rack-scale system houses 72 Rubin GPUs and 36 Vera CPUs, built on NVIDIA's third-generation MGX design. Shipment is targeted for H2 2026.

Core Performance Specs

  • Inference: 50 PFLOPS per GPU (NVFP4) — 5x Blackwell GB200
  • Training: 35 PFLOPS per GPU (NVFP4)
  • Memory bandwidth: 22 TB/s (HBM4) — 2.8x Blackwell
  • Scale-up connectivity: 260 TB/s with low latency
  • Transistors: 336 billion per Rubin GPU — 1.6x Blackwell

Cost Efficiency

Vera Rubin reduces GPUs needed for MoE model training by 4x and cuts inference cost per million tokens by 10x versus Blackwell. NVIDIA's first 100% liquid-cooled platform delivers 10x performance per watt improvement despite consuming about twice the power of Blackwell.

Availability

AWS, Google Cloud, Microsoft Azure, OCI, CoreWeave, Lambda, Nebius, and Nscale will offer Vera Rubin instances starting H2 2026. The system comprises 1.3 million components.

Source: CNBC

Share:

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.