NanoGPT Slowrun community debate highlights data-efficient LLM training
Original: NanoGPT Slowrun: Language Modeling with Limited Data, Infinite Compute View original →
Why This HN Post Drew Attention
Hacker News users pushed NanoGPT Slowrun to the front page on March 4, 2026 (UTC). At crawl time, the submission had a score of 116 and 24 comments. The linked post from Q Labs proposes a simple but unusual benchmark: hold data fixed at 100M FineWeb tokens, allow large compute budgets, and optimize for validation loss rather than wall-clock speed.
The original write-up: qlabs.sh/slowrun. Open repo: github.com/qlabs-eng/slowrun. HN discussion: item 47251259.
Core Technical Claim
The project argues that current scaling practice is likely to hit a data bottleneck before a compute bottleneck, so optimization targets should change. Instead of adding more tokens, Slowrun focuses on algorithmic changes that improve data efficiency under fixed-data conditions. Q Labs reports an initial baseline around 2.4x data efficiency relative to modded-nanogpt, then an update to 5.5x after community pull requests in the first week.
What Changed in Early Iterations
- Per-epoch shuffling in multi-epoch training.
- Learned projections for value embeddings.
- Activation update from squared ReLU to SwiGLU.
- Model ensembling experiments.
The authors also list open directions such as second-order optimizers, natural-gradient methods, curriculum learning, diffusion models, and alternatives to standard gradient descent.
Community Discussion Themes
Commenters highlighted overlap with recent "limited data, high compute" pretraining research, asked whether the baseline choice favors certain techniques, and raised the risk of overfitting or memorization when repeatedly training on a small corpus. Others argued that this benchmark is valuable precisely because it inverts the usual speed-centric objective and exposes methods that are expensive but potentially more sample efficient.
Why It Matters for LLM Engineering
Even if the current benchmark is narrow, it offers a practical testbed for methods teams usually postpone due to throughput pressure. If similar gains hold across broader datasets and model scales, workflows that prioritize data efficiency could become a meaningful complement to standard scale-up playbooks.
Related Articles
A LocalLLaMA post pointed to a new Hugging Face dataset of human-written code reviews, pairing before-and-after code changes with inline reviewer comments and negative examples across 37 languages.
Andrej Karpathy has published autoresearch, a minimal repo that lets AI agents iterate on a stripped-down nanochat training loop overnight. The project turns agent evaluation into a closed-loop research workflow with fixed 5-minute runs, Git branches, and validation-loss-based selection.
OpenAI announced an Operator upgrade adding Google Drive slides creation/editing and Jupyter-mode code execution in Browser. It also said Operator availability expanded to 20 additional regions in recent weeks, with new country additions including Korea and several European markets.
Comments (0)
No comments yet. Be the first to comment!