Papers with Code now has to track “papers without code”

A post from Hugging Face open-source engineer Niels Rogge on r/MachineLearning framed the relaunch of Papers with Code around a problem the AI community now runs into every week: the most visible benchmark results are not always attached to open code. The post jokingly called the situation “Papers Without Code,” but the underlying issue is serious. Modern leaderboards need to represent closed models without pretending they offer the same verification path as open papers and repos.

According to the post, the relaunched site automatically parses research papers published on arXiv and Hugging Face to surface state-of-the-art results across AI domains, from 3D generation to agents. The example shown was BrowseComp under the agents task, with both scatter plots and tables for each benchmark. The notable addition is support for closed-source model evaluations. Users can include those results in leaderboard views, or disable them with a toggle or account setting.

That design choice is why the Reddit thread is worth more than a product note. Papers with Code historically implied a tight relationship between a result, a paper, and code someone else could inspect or run. In 2026, many frontier results arrive through product posts, model cards, or technical reports. Excluding them would make leaderboards feel incomplete. Mixing them in without labels would blur the difference between an independently reproducible result and a vendor-reported number.

The useful compromise is visibility with metadata. If a leaderboard marks closed results clearly and lets readers filter them out, it can show the competitive landscape while preserving the distinction that researchers care about. The change reflects a broader shift in AI evaluation culture: benchmark pages are no longer just indexes of reproducible papers. They are becoming maps of claims, sources, openness levels, and model availability. That is a harder job than ranking rows by a score.

Papers with Code now has to track “papers without code”

Related Articles

Google DeepMind Proposes a Cognitive Framework for Measuring AGI Progress

GLM-5 Scores 50 on Intelligence Index, Becomes New Open Weights Leader

LiteCoder pushes terminal agents to 31.5% on Terminal Bench Pro

Related Articles

Google DeepMind Proposes a Cognitive Framework for Measuring AGI Progress
LLM Mar 24, 2026 2 min read

GLM-5 Scores 50 on Intelligence Index, Becomes New Open Weights Leader
LLM Reddit Feb 12, 2026 2 min read

LiteCoder pushes terminal agents to 31.5% on Terminal Bench Pro
LLM Apr 15, 2026 2 min read