DeepMind trains a 12B model across four regions 20x faster
Original: This is Decoupled DiLoCo: our new resilient and flexible way to train advanced AI models across multiple data centres. View original →
Google DeepMind used its April 23 source thread to introduce Decoupled DiLoCo as a resilient and flexible way to train advanced AI models across multiple data centres. That framing matters because it targets a growing frontier bottleneck: not model quality, but the brittleness of keeping giant clusters synchronized across hardware failures and datacenter boundaries.
The linked blog post puts hard numbers behind the claim. Google DeepMind says Decoupled DiLoCo trained a 12 billion parameter Gemma model across four separate U.S. regions using 2-5 Gbps wide-area networking, and did so more than 20 times faster than conventional synchronization methods. On benchmarked ML performance, it reached 64.1% average accuracy, almost matching the 64.4% baseline while using dramatically less bandwidth.
Another important detail is failure tolerance. In simulated large-scale outages, DeepMind says the system held 88% goodput versus 27% for standard data-parallel training. The setup can also combine TPU v6e and TPU v5p in one training run without losing ML performance, which matters for any lab trying to use partially upgraded fleets instead of waiting for perfectly matched clusters. The same figure set says required bandwidth can drop from 198 Gbps to 0.84 Gbps across eight datacenters. That is not a small optimization; it is a different assumption about what counts as usable training infrastructure.
The GoogleDeepMind account usually uses X to point to research papers, model work, and infrastructure milestones, and this post is clearly in the infrastructure bucket. The next thing to watch is whether Decoupled DiLoCo stays a Gemma-era research result or becomes part of larger production training runs. If it scales beyond the demo numbers, it could reshape how frontier labs think about stranded compute, chip heterogeneity, and failure tolerance.
Related Articles
Google DeepMind’s new audio model translates speech across more than 70 languages while preserving tone, pace, and pitch. The rollout spans Google Translate, Google AI Studio, the Gemini Live API, and Google Meet previews.
Anthropic said it has signed a new agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity that will begin coming online in 2027. The company framed it as its largest compute commitment so far, tied to surging Claude demand and a rapid jump in large enterprise customers.
Google DeepMind says a Sierra Leone classroom trial shifted Gemini use toward learning behavior: queries about how to tackle problems rose from 68% to 90%. The eight-week RCT covered 1,763 students across 12 schools.