#llm-training

LLM 2d ago 2 min read

DeepMind's Decoupled DiLoCo chases zero-downtime LLM training

DeepMind is aiming at a stubborn systems problem: one slow or broken learner can still stall an entire pretraining run. The paper claims competitive model quality with strictly zero global downtime in failure-prone simulations spanning millions of chips.

#google-deepmind #diloco #llm-training

LLM 3d ago 2 min read

DeepMind's Decoupled DiLoCo keeps frontier training alive through failures

Training a frontier model across far-flung data centers usually means paying a brutal synchronization tax. DeepMind says Decoupled DiLoCo cuts cross-site bandwidth from 198 Gbps to 0.84 Gbps in its eight-datacenter setup while holding benchmark ML accuracy near baseline at 64.1%.

#google-deepmind #diloco #llm-training

LLM Hacker News Apr 8, 2026 2 min read

MegaTrain turns a Hacker News paper pick into a memory-systems debate about single-GPU LLM training

MegaTrain proposes training 100B+ parameter LLMs at full precision on a single GPU by keeping parameters and optimizer states in host memory and streaming layers through the device. The recent Hacker News interest is notable because the paper reframes the problem as one of memory-system design rather than simple GPU count.

#llm-training #systems #gpu

LLM sources.twitter Apr 4, 2026 1 min read

Anthropic Claims Large-Scale Distillation Attacks on Claude Involved 24,000 Accounts and 16 Million Exchanges

Anthropic said on February 23, 2026 that DeepSeek, Moonshot AI, and MiniMax carried out industrial-scale distillation attacks against Claude. The company framed model-output extraction as a security and platform integrity problem, not just a competitive concern.

#model-distillation #ai-security #claude

LLM Reddit Mar 10, 2026 2 min read

LocalLLaMA Highlights a 356K-Row Human Code Review Dataset for Training Coding Models

A LocalLLaMA post pointed to a new Hugging Face dataset of human-written code reviews, pairing before-and-after code changes with inline reviewer comments and negative examples across 37 languages.

#code-review #datasets #github

LLM sources.twitter Mar 9, 2026 2 min read

Karpathy open-sources autoresearch for autonomous single-GPU nanochat experiments

Andrej Karpathy has published autoresearch, a minimal repo that lets AI agents iterate on a stripped-down nanochat training loop overnight. The project turns agent evaluation into a closed-loop research workflow with fixed 5-minute runs, Git branches, and validation-loss-based selection.

#karpathy #agents #open-source

LLM Hacker News Mar 5, 2026 2 min read

NanoGPT Slowrun community debate highlights data-efficient LLM training

A March 4, 2026 Hacker News thread elevated Q Labs’ Slowrun benchmark, which fixes training data at 100M FineWeb tokens and optimizes for data efficiency under large compute budgets.

#nanogpt #data-efficiency #llm-training

LLM Reddit Feb 21, 2026 2 min read

Reddit Discusses arXiv 2602.15322: Masked Adaptive Updates (Magma) for LLM Pretraining

A high-engagement r/singularity post pointed to arXiv 2602.15322, which reports that masked adaptive updates and the proposed Magma optimizer can improve 1B-model perplexity versus Adam and Muon with minimal overhead.

#llm-training #optimizers #rmsprop