NVIDIA and Google Cloud push AI factories toward 960,000 Rubin GPUs
Original: NVIDIA and Google Cloud Collaborate to Advance Agentic and Physical AI View original →
The interesting part of NVIDIA and Google Cloud’s new stack is not the partnership itself. Big cloud companies and GPU vendors have been pairing off for years. The real news is the scale they now treat as normal for the agent era: AI factories built on Vera Rubin-powered A5X systems, up to 80,000 Rubin GPUs in a single site cluster, and up to 960,000 across a multisite cluster.
Those numbers matter because AI infrastructure is being re-architected around inference-heavy, long-running agents and physical AI workloads, not just frontier model training. NVIDIA says A5X can deliver up to 10x lower inference cost per token and 10x higher token throughput per megawatt than the prior generation. If that claim holds in production, the economics of deploying agents at enterprise scale change fast. Suddenly the limiting factor is less “can a model do the task?” and more “can you serve millions of tasks cheaply, securely, and close to sensitive data?”
The security and deployment layer is just as important as the raw compute. Google says Gemini running on Blackwell and Blackwell Ultra is in preview on Google Distributed Cloud, while confidential G4 VMs bring Blackwell-based confidential computing into the public cloud. That gives regulated customers a path to keep prompts, models, and fine-tuning data encrypted even while they push more workloads into managed infrastructure. For banks, healthcare groups, manufacturers, and governments, this is the difference between demo AI and deployable AI.
Google and NVIDIA are also stitching the open-model ecosystem directly into the stack. Nemotron 3 Super is available on Gemini Enterprise Agent Platform, and Google introduced a managed reinforcement learning API built with NVIDIA NeMo RL. That matters because the next wave of enterprise agents will not all run on a single frontier model. Teams want a mix of proprietary models, open weights, domain tuning, and workflow-specific reinforcement learning. The platform is being built around that reality.
The last notable shift is physical AI. NVIDIA is tying Omniverse, Isaac Sim, NIM microservices, and Google Cloud Marketplace into one story about digital twins, robotics, design software, and factory optimization. That moves the conversation beyond chat assistants. The pitch here is that the same cloud stack can train code agents, run industrial simulations, and support robots that reason about the world. If that lands, AI infrastructure stops looking like a software line item and starts looking like industrial capacity.
Related Articles
NVIDIA says Vera is now in full production and can complete agentic workloads 1.8x faster than x86 CPUs. OpenAI, Anthropic, SpaceXAI, ByteDance, CoreWeave, and OCI are among the names tied to adoption or evaluation.
NVIDIA outlined a Rubin-based DGX SuperPOD architecture that combines compute, networking, and operations software as one deployment stack. The company claims up to 10x lower inference token cost versus the prior generation and targets availability in the second half of 2026.
NVIDIA said GTC 2026 will run March 16-19 in San Jose, California. The company projects 30,000+ attendees from 190+ countries and more than 1,000 sessions across the AI stack. The program includes Jensen Huang’s keynote, hands-on labs, startup showcases, and an analyst Q&A session.