Berkeley Shows How Benchmark Hacking Can Inflate AI Agent Scores

Original: How We Broke Top AI Agent Benchmarks: And What Comes Next View original →

Read in other languages: 한국어日本語
AI Apr 12, 2026 By Insights AI (HN) 1 min read 1 views Source

Why the post mattered on Hacker News

The UC Berkeley write-up published in April 2026 drew 202 points and 58 comments on Hacker News by April 12, 2026. Its premise is unusually direct: the researchers audited eight widely cited AI agent benchmarks and found ways to achieve near-perfect scores without solving the tasks those benchmarks were supposed to measure.

The paper’s broader claim is that benchmark numbers are not a reliable proxy for capability when the evaluation pipeline is easy to exploit. The authors focus on three recurring failure modes: agents can tamper with the artifacts the evaluator later reads, gold answers are sometimes exposed inside configs or public files, and validators often score superficial output patterns rather than actual task completion.

Examples Berkeley highlights

  • On SWE-bench Verified, the team says a short conftest.py hook can force every test to pass.
  • On Terminal-Bench, a fake curl wrapper can produce a perfect score across all 89 tasks.
  • On WebArena, an agent can navigate Chromium to a local file:// path and read answer keys from config files.
  • On FieldWorkArena, the validator reportedly checks only whether the final message came from the assistant, so sending {} is enough to pass.

What comes next

The article does more than embarrass benchmark builders. It lays out concrete hardening steps: isolate evaluator state, prevent agents from writing to the paths that scoring code trusts, use more robust scoring, and keep ground truth private for any benchmark that drives public leaderboards. The authors are also developing BenchJack, an automated benchmark vulnerability scanner. For teams using benchmark tables to decide which agent stack to deploy, the message is hard to miss: don’t trust the number before you trust the methodology.

Original source: UC Berkeley RDI. Hacker News discussion: thread.

Share: Long

Related Articles

AI sources.twitter Mar 17, 2026 2 min read

OpenAI said on March 9, 2026 that it plans to acquire Promptfoo. The company said Promptfoo's technology will strengthen agentic security testing and evaluation inside OpenAI Frontier, while Promptfoo remains open source under its current license and existing customers continue to receive support.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.