NVIDIA and Google Cloud push AI factories toward 960,000 Rubin GPUs
Original: NVIDIA and Google Cloud Collaborate to Advance Agentic and Physical AI View original →
The interesting part of NVIDIA and Google Cloud’s new stack is not the partnership itself. Big cloud companies and GPU vendors have been pairing off for years. The real news is the scale they now treat as normal for the agent era: AI factories built on Vera Rubin-powered A5X systems, up to 80,000 Rubin GPUs in a single site cluster, and up to 960,000 across a multisite cluster.
Those numbers matter because AI infrastructure is being re-architected around inference-heavy, long-running agents and physical AI workloads, not just frontier model training. NVIDIA says A5X can deliver up to 10x lower inference cost per token and 10x higher token throughput per megawatt than the prior generation. If that claim holds in production, the economics of deploying agents at enterprise scale change fast. Suddenly the limiting factor is less “can a model do the task?” and more “can you serve millions of tasks cheaply, securely, and close to sensitive data?”
The security and deployment layer is just as important as the raw compute. Google says Gemini running on Blackwell and Blackwell Ultra is in preview on Google Distributed Cloud, while confidential G4 VMs bring Blackwell-based confidential computing into the public cloud. That gives regulated customers a path to keep prompts, models, and fine-tuning data encrypted even while they push more workloads into managed infrastructure. For banks, healthcare groups, manufacturers, and governments, this is the difference between demo AI and deployable AI.
Google and NVIDIA are also stitching the open-model ecosystem directly into the stack. Nemotron 3 Super is available on Gemini Enterprise Agent Platform, and Google introduced a managed reinforcement learning API built with NVIDIA NeMo RL. That matters because the next wave of enterprise agents will not all run on a single frontier model. Teams want a mix of proprietary models, open weights, domain tuning, and workflow-specific reinforcement learning. The platform is being built around that reality.
The last notable shift is physical AI. NVIDIA is tying Omniverse, Isaac Sim, NIM microservices, and Google Cloud Marketplace into one story about digital twins, robotics, design software, and factory optimization. That moves the conversation beyond chat assistants. The pitch here is that the same cloud stack can train code agents, run industrial simulations, and support robots that reason about the world. If that lands, AI infrastructure stops looking like a software line item and starts looking like industrial capacity.
Related Articles
Why it matters: AI infrastructure is moving from single accelerator rentals to managed clusters that resemble supercomputers. Google Cloud said A4X Max bare-metal instances support up to 50,000 GPUs and twice the network bandwidth of earlier generations.
HN treated TPU 8t and 8i as more than giant datacenter numbers. The thread focused on the bigger shift: agent-era infrastructure is splitting training and inference into separate hardware bets.
On March 17, 2026, NVIDIADC described Groq 3 LPX on X as a new rack-scale low-latency inference accelerator for the Vera Rubin platform. NVIDIA’s March 16 press release and technical blog say LPX brings 256 LPUs, 128 GB of on-chip SRAM, and 640 TB/s of scale-up bandwidth into a heterogeneous inference path with Vera Rubin NVL72 for agentic AI workloads.
Comments (0)
No comments yet. Be the first to comment!