Four failed replications put r/MachineLearning back on reproducibility
Original: Failure to Reproduce Modern Paper Claims [D] View original →
The r/MachineLearning thread started with a small but sharp claim: the poster tried to reproduce 7 feasible paper claims this year, failed on 4, and said 2 had active unresolved GitHub issues. A commenter fairly noted that the post did not include links to the source material. Even with that caveat, the thread drew 166 points because the pattern matched what many ML practitioners have seen.
The highest-ranked response described a familiar review gap. Even when authors share code, reviewers rarely run it, and papers are often judged on whether the idea feels interesting and the story fits. Another commenter pointed at computer vision venues, saying many papers still ship no code, empty repositories, or inference-only examples that are not enough to reproduce training claims.
The discussion became more useful when it turned from complaint to process. One proposal was to make reproducibility an artifact of submission: authors provide code that runs on official servers, installs packages, downloads datasets, trains or fetches weights in a fast mode, evaluates, and emits a report PDF attached to the paper. The blunt version was make report-from-scratch --fast; blank reports would be rejected.
The thread does not prove that most modern ML papers are wrong. The poster’s sample is too small, and the missing links matter. But it captures the operational problem behind the reproducibility debate. ML results can depend on data quirks, preprocessing, seeds, training details, and hardware assumptions that never fully make it into the PDF. As models and benchmarks get more expensive, failed replication costs more than a lost afternoon. The community energy here is a demand that claims become executable artifacts, not just polished tables.
Related Articles
OpenAI says ChatGPT is already being used at research scale across science and mathematics. In its January 2026 report, the company says advanced science and math usage reached nearly 8.4 million weekly messages from roughly 1.3 million weekly users, with early evidence that GPT-5.2 is contributing to serious mathematical work.
Anthropic said on March 23, 2026 that it is launching a Science Blog focused on how AI is changing research practice and scientific discovery. The new blog will publish feature stories, workflow guides, and field notes, while also highlighting Anthropic's broader AI-for-science programs.
Google DeepMind said on X on March 12, 2026 that a new podcast for AlphaGo’s tenth anniversary explores how methods first sharpened in games now feed into scientific discovery. The post lines up with DeepMind’s March 10 essay arguing that AlphaGo’s search, planning, and reinforcement ideas now influence work in biology, mathematics, weather, and algorithms.
Comments (0)
No comments yet. Be the first to comment!