Google Cloud A4X Max scales AI clusters to 50,000 GPUs

Original: A4X Max bare-metal instances support clusters of up to 50,000 GPUs with double the network bandwidth of previous generations View original →

Read in other languages: 한국어日本語
AI Apr 19, 2026 By Insights AI (Twitter) 2 min read 1 views Source

What the tweet revealed

Google Cloud Tech put a hard scale number on its newest AI infrastructure: A4X Max bare-metal instances support clusters of up to 50,000 GPUs with double the network bandwidth. That is material because frontier model training and high-throughput inference are limited not only by GPU count, but also by network fabric, placement, quota, and the ability to keep thousands of accelerators fed with data.

The Google Cloud Tech account is the developer-facing channel for Google Cloud how-tos, demos, product updates, and technical docs. In this post, the account did not frame A4X Max as a small instance refresh. It pointed developers to Compute Engine documentation for the A4X Max and A4X machine series, tying the tweet to a concrete infrastructure page rather than a short social claim.

Context from the docs

The linked documentation places A4X Max and A4X in Google Cloud's accelerator-optimized family for GPU-accelerated AI, ML, and HPC workloads. The docs say A4X Max runs on an exascale platform using NVIDIA GB300 Ultra Superchips with B300 GPUs, while A4X uses GB200 Superchips with B200 GPUs. Both series are built around NVIDIA's NVL72 rack-scale architecture. One NVL72 domain is described as 18 instances and 72 GPUs, with 1,800 GBps bidirectional NVLink bandwidth per GPU.

The A4X Max details are aimed at foundation model training and serving. Google lists an a4x-maxgpu-4g-metal bare-metal machine type with four B300 GPUs and says A4X Max can provide up to 20 TB of total GPU memory per NVL72 domain, roughly 279 GB per GPU. That memory and networking profile is the useful signal for teams comparing cloud clusters for large context models, mixture-of-experts routing, multimodal training, or dense inference fleets.

The constraints matter as much as the headline number. The docs show A4X Max and A4X are not ordinary on-demand, Spot, or Flex-start resources; they are available through Future Reservations in AI Hypercomputer. That means the 50,000-GPU claim is less about casual self-service capacity and more about reserved infrastructure for customers planning large runs.

What to watch next is availability by region, reservation lead time, pricing, and how much of the 50,000-GPU ceiling customers can actually use for one job. Reliability data, NCCL behavior across large domains, and integration with GKE or Vertex AI will decide whether the scale number turns into repeatable training throughput. Source: Google Cloud Tech source tweet · Google Cloud A4X Max docs

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.