Google Cloud A4X Max scales AI clusters to 50,000 GPUs
Original: A4X Max bare-metal instances support clusters of up to 50,000 GPUs with double the network bandwidth of previous generations View original →
What the tweet revealed
Google Cloud Tech put a hard scale number on its newest AI infrastructure: A4X Max bare-metal instances support clusters of up to 50,000 GPUs with double the network bandwidth. That is material because frontier model training and high-throughput inference are limited not only by GPU count, but also by network fabric, placement, quota, and the ability to keep thousands of accelerators fed with data.
The Google Cloud Tech account is the developer-facing channel for Google Cloud how-tos, demos, product updates, and technical docs. In this post, the account did not frame A4X Max as a small instance refresh. It pointed developers to Compute Engine documentation for the A4X Max and A4X machine series, tying the tweet to a concrete infrastructure page rather than a short social claim.
Context from the docs
The linked documentation places A4X Max and A4X in Google Cloud's accelerator-optimized family for GPU-accelerated AI, ML, and HPC workloads. The docs say A4X Max runs on an exascale platform using NVIDIA GB300 Ultra Superchips with B300 GPUs, while A4X uses GB200 Superchips with B200 GPUs. Both series are built around NVIDIA's NVL72 rack-scale architecture. One NVL72 domain is described as 18 instances and 72 GPUs, with 1,800 GBps bidirectional NVLink bandwidth per GPU.
The A4X Max details are aimed at foundation model training and serving. Google lists an a4x-maxgpu-4g-metal bare-metal machine type with four B300 GPUs and says A4X Max can provide up to 20 TB of total GPU memory per NVL72 domain, roughly 279 GB per GPU. That memory and networking profile is the useful signal for teams comparing cloud clusters for large context models, mixture-of-experts routing, multimodal training, or dense inference fleets.
The constraints matter as much as the headline number. The docs show A4X Max and A4X are not ordinary on-demand, Spot, or Flex-start resources; they are available through Future Reservations in AI Hypercomputer. That means the 50,000-GPU claim is less about casual self-service capacity and more about reserved infrastructure for customers planning large runs.
What to watch next is availability by region, reservation lead time, pricing, and how much of the 50,000-GPU ceiling customers can actually use for one job. Reliability data, NCCL behavior across large domains, and integration with GKE or Vertex AI will decide whether the scale number turns into repeatable training throughput. Source: Google Cloud Tech source tweet · Google Cloud A4X Max docs
Related Articles
HN treated rising GPU costs as more than infrastructure trivia. If frontier access tightens and inference gets pricier, startups may have to compete on procurement, routing, caching, evaluation, and smaller-model strategy rather than assuming abundant calls to the strongest model.
OpenAI said it closed a $122 billion funding round on March 31, 2026 at an $852 billion post-money valuation. The company tied the raise to compute expansion, product development, and deeper enterprise and developer adoption.
OpenAI said on March 31, 2026 that it closed a $122 billion funding round at an $852 billion post-money valuation. The company tied the raise to faster compute expansion, enterprise growth, and a unified AI superapp strategy spanning ChatGPT, Codex, and broader agent workflows.
Comments (0)
No comments yet. Be the first to comment!