Skip to content

Papers with Code now has to track “papers without code”

Original: Introducing Papers Without Code [P] View original →

Read in other languages: 한국어日本語
LLM Jun 12, 2026 By Insights AI (Reddit) 1 min read 1 views Source

A post from Hugging Face open-source engineer Niels Rogge on r/MachineLearning framed the relaunch of Papers with Code around a problem the AI community now runs into every week: the most visible benchmark results are not always attached to open code. The post jokingly called the situation “Papers Without Code,” but the underlying issue is serious. Modern leaderboards need to represent closed models without pretending they offer the same verification path as open papers and repos.

According to the post, the relaunched site automatically parses research papers published on arXiv and Hugging Face to surface state-of-the-art results across AI domains, from 3D generation to agents. The example shown was BrowseComp under the agents task, with both scatter plots and tables for each benchmark. The notable addition is support for closed-source model evaluations. Users can include those results in leaderboard views, or disable them with a toggle or account setting.

That design choice is why the Reddit thread is worth more than a product note. Papers with Code historically implied a tight relationship between a result, a paper, and code someone else could inspect or run. In 2026, many frontier results arrive through product posts, model cards, or technical reports. Excluding them would make leaderboards feel incomplete. Mixing them in without labels would blur the difference between an independently reproducible result and a vendor-reported number.

The useful compromise is visibility with metadata. If a leaderboard marks closed results clearly and lets readers filter them out, it can show the competitive landscape while preserving the distinction that researchers care about. The change reflects a broader shift in AI evaluation culture: benchmark pages are no longer just indexes of reproducible papers. They are becoming maps of claims, sources, openness levels, and model availability. That is a harder job than ranking rows by a score.

Share: Long

Related Articles