#llm-training

LLM Apr 25, 2026 2 min read

DeepMind's Decoupled DiLoCo chases zero-downtime LLM training

DeepMind is aiming at a stubborn systems problem: one slow or broken learner can still stall an entire pretraining run. The paper claims competitive model quality with strictly zero global downtime in failure-prone simulations spanning millions of chips.

#google-deepmind #diloco #llm-training

LLM Hacker News Apr 8, 2026 2 min read

MegaTrain turns a Hacker News paper pick into a memory-systems debate about single-GPU LLM training

MegaTrain proposes training 100B+ parameter LLMs at full precision on a single GPU by keeping parameters and optimizer states in host memory and streaming layers through the device. The recent Hacker News interest is notable because the paper reframes the problem as one of memory-system design rather than simple GPU count.

#llm-training #systems #gpu

LLM Reddit Mar 10, 2026 2 min read

LocalLLaMA Highlights a 356K-Row Human Code Review Dataset for Training Coding Models

A LocalLLaMA post pointed to a new Hugging Face dataset of human-written code reviews, pairing before-and-after code changes with inline reviewer comments and negative examples across 37 languages.

#code-review #datasets #github

117

LLM X/Twitter Mar 9, 2026 2 min read

Karpathy open-sources autoresearch for autonomous single-GPU nanochat experiments

Andrej Karpathy has published autoresearch, a minimal repo that lets AI agents iterate on a stripped-down nanochat training loop overnight. The project turns agent evaluation into a closed-loop research workflow with fixed 5-minute runs, Git branches, and validation-loss-based selection.

#karpathy #agents #open-source

119

LLM Hacker News Mar 5, 2026 2 min read

NanoGPT Slowrun community debate highlights data-efficient LLM training

A March 4, 2026 Hacker News thread elevated Q Labs’ Slowrun benchmark, which fixes training data at 100M FineWeb tokens and optimizes for data efficiency under large compute budgets.

#nanogpt #data-efficiency #llm-training

LLM Reddit Feb 21, 2026 2 min read

Reddit Discusses arXiv 2602.15322: Masked Adaptive Updates (Magma) for LLM Pretraining

A high-engagement r/singularity post pointed to arXiv 2602.15322, which reports that masked adaptive updates and the proposed Magma optimizer can improve 1B-model perplexity versus Adam and Muon with minimal overhead.

#llm-training #optimizers #rmsprop