Four failed replications put r/MachineLearning back on reproducibility

The r/MachineLearning thread started with a small but sharp claim: the poster tried to reproduce 7 feasible paper claims this year, failed on 4, and said 2 had active unresolved GitHub issues. A commenter fairly noted that the post did not include links to the source material. Even with that caveat, the thread drew 166 points because the pattern matched what many ML practitioners have seen.

The highest-ranked response described a familiar review gap. Even when authors share code, reviewers rarely run it, and papers are often judged on whether the idea feels interesting and the story fits. Another commenter pointed at computer vision venues, saying many papers still ship no code, empty repositories, or inference-only examples that are not enough to reproduce training claims.

The discussion became more useful when it turned from complaint to process. One proposal was to make reproducibility an artifact of submission: authors provide code that runs on official servers, installs packages, downloads datasets, trains or fetches weights in a fast mode, evaluates, and emits a report PDF attached to the paper. The blunt version was make report-from-scratch --fast; blank reports would be rejected.

The thread does not prove that most modern ML papers are wrong. The poster’s sample is too small, and the missing links matter. But it captures the operational problem behind the reproducibility debate. ML results can depend on data quirks, preprocessing, seeds, training details, and hardware assumptions that never fully make it into the PDF. As models and benchmarks get more expensive, failed replication costs more than a lost afternoon. The community energy here is a demand that claims become executable artifacts, not just polished tables.

Four failed replications put r/MachineLearning back on reproducibility

Related Articles

GeneBench-Pro turns biology-agent testing into 129 hard problems

BMS turns eight Vera Rubin racks into a drug-discovery AI factory

Meta models move into a 100,000-images-per-second science bottleneck

Related Articles

GeneBench-Pro turns biology-agent testing into 129 hard problems
Sciences X/Twitter Jul 1, 2026 1 min read

BMS turns eight Vera Rubin racks into a drug-discovery AI factory

Meta models move into a 100,000-images-per-second science bottleneck