NVIDIA and Google Cloud push AI factories toward 960,000 Rubin GPUs

The interesting part of NVIDIA and Google Cloud’s new stack is not the partnership itself. Big cloud companies and GPU vendors have been pairing off for years. The real news is the scale they now treat as normal for the agent era: AI factories built on Vera Rubin-powered A5X systems, up to 80,000 Rubin GPUs in a single site cluster, and up to 960,000 across a multisite cluster.

Those numbers matter because AI infrastructure is being re-architected around inference-heavy, long-running agents and physical AI workloads, not just frontier model training. NVIDIA says A5X can deliver up to 10x lower inference cost per token and 10x higher token throughput per megawatt than the prior generation. If that claim holds in production, the economics of deploying agents at enterprise scale change fast. Suddenly the limiting factor is less “can a model do the task?” and more “can you serve millions of tasks cheaply, securely, and close to sensitive data?”

The security and deployment layer is just as important as the raw compute. Google says Gemini running on Blackwell and Blackwell Ultra is in preview on Google Distributed Cloud, while confidential G4 VMs bring Blackwell-based confidential computing into the public cloud. That gives regulated customers a path to keep prompts, models, and fine-tuning data encrypted even while they push more workloads into managed infrastructure. For banks, healthcare groups, manufacturers, and governments, this is the difference between demo AI and deployable AI.

Google and NVIDIA are also stitching the open-model ecosystem directly into the stack. Nemotron 3 Super is available on Gemini Enterprise Agent Platform, and Google introduced a managed reinforcement learning API built with NVIDIA NeMo RL. That matters because the next wave of enterprise agents will not all run on a single frontier model. Teams want a mix of proprietary models, open weights, domain tuning, and workflow-specific reinforcement learning. The platform is being built around that reality.

The last notable shift is physical AI. NVIDIA is tying Omniverse, Isaac Sim, NIM microservices, and Google Cloud Marketplace into one story about digital twins, robotics, design software, and factory optimization. That moves the conversation beyond chat assistants. The pitch here is that the same cloud stack can train code agents, run industrial simulations, and support robots that reason about the world. If that lands, AI infrastructure stops looking like a software line item and starts looking like industrial capacity.

NVIDIA and Google Cloud push AI factories toward 960,000 Rubin GPUs

Related Articles

Google Cloud A4X Max scales AI clusters to 50,000 GPUs

HN read Google’s TPU 8t and 8i as a sign that agent workloads need different silicon

NVIDIA positions Groq 3 LPX as the low-latency inference rack for Vera Rubin

Comments (0)

Leave a Comment

Related Articles

Google Cloud A4X Max scales AI clusters to 50,000 GPUs

HN read Google’s TPU 8t and 8i as a sign that agent workloads need different silicon

NVIDIA positions Groq 3 LPX as the low-latency inference rack for Vera Rubin
AI sources.twitter Apr 2, 2026 2 min read