Karpathy open-sources autoresearch for autonomous single-GPU nanochat experiments
Original: Karpathy open-sources autoresearch for autonomous single-GPU nanochat experiments View original →
What Karpathy published
On March 7, Andrej Karpathy said he had packaged his recent autoresearch work into a self-contained repository that others could try over a weekend. The tweet describes a stripped-down single-GPU version of the nanochat training core where the human edits Markdown instructions and the AI agent edits the Python training code. Instead of prompting for one-off answers, the project turns an agent into a loop that proposes code changes, runs training, measures the outcome, and keeps iterating.
How the repo works
The GitHub page describes autoresearch as AI agents running research on single-GPU nanochat training automatically. Each experiment is budgeted at exactly five minutes, which gives roughly 12 runs per hour and about 100 while a user sleeps. The agent works on a Git feature branch, accumulates commits, and selects for lower validation loss rather than for subjective impressions. Karpathy’s framing is that the human should stop hand-editing the training loop and instead program the research organization itself through files such as program.md.
The repository is intentionally minimal. Karpathy says the training core is compressed into about 630 lines for a single-GPU setup, which makes the loop easier for an agent to inspect and modify. The README also notes that the current version expects one NVIDIA GPU, while forks can extend the idea to other platforms. That scope choice matters because it keeps the benchmark small enough to iterate quickly but still real enough to test whether an agent can improve a nontrivial training system.
Why this matters
The broader significance is not the specific nanochat baseline. It is the attempt to make autonomous research measurable, cheap, and repeatable. Fixed five-minute runs, Git-native versioning, and validation-loss-based selection create a cleaner testbed for comparing prompts, agents, and coordination strategies. If projects like this mature, the relevant question for research teams shifts from can an agent write code to can an agent run a disciplined experimental program that compounds over time.
Sources: Karpathy X post, GitHub
Related Articles
A Hacker News submission highlighted Andrej Karpathy's Autoresearch repo, a minimal setup where an AI agent edits one training file, runs fixed 5-minute experiments, and keeps only changes that improve `val_bpb`.
Z.ai unveiled GLM-5, a 744B parameter (40B active) model pre-trained on 28.5T tokens. Designed for complex systems engineering and long-horizon agentic tasks, it leads open-source models in multiple benchmarks.
OpenAI Developers has updated its GPT-5.4 API prompting guide. The new guidance focuses on tool use, structured outputs, verification loops, and long-running workflows for production-grade agents.
Comments (0)
No comments yet. Be the first to comment!