DeepMind trains a 12B model across four regions 20x faster

Original: This is Decoupled DiLoCo: our new resilient and flexible way to train advanced AI models across multiple data centres. View original →

Read in other languages: 한국어日本語
AI Apr 25, 2026 By Insights AI 1 min read Source

Google DeepMind used its April 23 source thread to introduce Decoupled DiLoCo as a resilient and flexible way to train advanced AI models across multiple data centres. That framing matters because it targets a growing frontier bottleneck: not model quality, but the brittleness of keeping giant clusters synchronized across hardware failures and datacenter boundaries.

The linked blog post puts hard numbers behind the claim. Google DeepMind says Decoupled DiLoCo trained a 12 billion parameter Gemma model across four separate U.S. regions using 2-5 Gbps wide-area networking, and did so more than 20 times faster than conventional synchronization methods. On benchmarked ML performance, it reached 64.1% average accuracy, almost matching the 64.4% baseline while using dramatically less bandwidth.

Another important detail is failure tolerance. In simulated large-scale outages, DeepMind says the system held 88% goodput versus 27% for standard data-parallel training. The setup can also combine TPU v6e and TPU v5p in one training run without losing ML performance, which matters for any lab trying to use partially upgraded fleets instead of waiting for perfectly matched clusters. The same figure set says required bandwidth can drop from 198 Gbps to 0.84 Gbps across eight datacenters. That is not a small optimization; it is a different assumption about what counts as usable training infrastructure.

The GoogleDeepMind account usually uses X to point to research papers, model work, and infrastructure milestones, and this post is clearly in the infrastructure bucket. The next thing to watch is whether Decoupled DiLoCo stays a Gemma-era research result or becomes part of larger production training runs. If it scales beyond the demo numbers, it could reshape how frontier labs think about stranded compute, chip heterogeneity, and failure tolerance.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.