#distributed-systems

LLM Apr 25, 2026 2 min read

DeepMind's Decoupled DiLoCo chases zero-downtime LLM training

DeepMind is aiming at a stubborn systems problem: one slow or broken learner can still stall an entire pretraining run. The paper claims competitive model quality with strictly zero global downtime in failure-prone simulations spanning millions of chips.

#google-deepmind #diloco #llm-training

AI Hacker News Apr 15, 2026 2 min read

HN keeps coming back to one point: multi-agent coding is a distributed-systems problem

HN latched onto a post that says the real bottleneck in multi-agent coding is coordination, not just model IQ. Once work is split across agents, the old distributed-systems vocabulary starts showing up whether the models are brilliant or not.

#agents #distributed-systems #software-engineering