DeepMind trains a 12B model across four regions 20x faster

Google DeepMind used its April 23 source thread to introduce Decoupled DiLoCo as a resilient and flexible way to train advanced AI models across multiple data centres. That framing matters because it targets a growing frontier bottleneck: not model quality, but the brittleness of keeping giant clusters synchronized across hardware failures and datacenter boundaries.

The linked blog post puts hard numbers behind the claim. Google DeepMind says Decoupled DiLoCo trained a 12 billion parameter Gemma model across four separate U.S. regions using 2-5 Gbps wide-area networking, and did so more than 20 times faster than conventional synchronization methods. On benchmarked ML performance, it reached 64.1% average accuracy, almost matching the 64.4% baseline while using dramatically less bandwidth.

Another important detail is failure tolerance. In simulated large-scale outages, DeepMind says the system held 88% goodput versus 27% for standard data-parallel training. The setup can also combine TPU v6e and TPU v5p in one training run without losing ML performance, which matters for any lab trying to use partially upgraded fleets instead of waiting for perfectly matched clusters. The same figure set says required bandwidth can drop from 198 Gbps to 0.84 Gbps across eight datacenters. That is not a small optimization; it is a different assumption about what counts as usable training infrastructure.

The GoogleDeepMind account usually uses X to point to research papers, model work, and infrastructure milestones, and this post is clearly in the infrastructure bucket. The next thing to watch is whether Decoupled DiLoCo stays a Gemma-era research result or becomes part of larger production training runs. If it scales beyond the demo numbers, it could reshape how frontier labs think about stranded compute, chip heterogeneity, and failure tolerance.

DeepMind trains a 12B model across four regions 20x faster

Related Articles

Anthropic signs multi-gigawatt TPU deal with Google and Broadcom

Anthropic signs multi-gigawatt TPU deal with Google and Broadcom

Cerebras files for IPO as OpenAI and AWS fuel chip demand

Comments (0)

Leave a Comment

Related Articles

Anthropic signs multi-gigawatt TPU deal with Google and Broadcom
AI Apr 11, 2026 2 min read

Anthropic signs multi-gigawatt TPU deal with Google and Broadcom
AI Apr 13, 2026 2 min read

Cerebras files for IPO as OpenAI and AWS fuel chip demand