#death:retracted

LLM X/Twitter Jul 10, 2026 2 min read

OpenAI says 30% of SWE-Bench Pro is broken and drops its recommendation

OpenAI says SWE-Bench Pro no longer reliably measures frontier coding capability after finding 30% of its public tasks broken. The cited issues include hidden requirements, contradictory instructions, strict tests and incomplete grading criteria.

#openai #swe-bench #coding-agents

AI X/Twitter Apr 29, 2026 2 min read

South Africa pulls draft AI policy after fake citations slipped in

The episode matters because governments are trying to govern AI while using the same tools inside the drafting process. South Africa pulled its first national AI draft after fictitious references surfaced, scrapping a plan that would have created three new institutions and new incentive programs.

#south-africa #ai-policy #regulation

LLM Hacker News Apr 28, 2026 2 min read

HN thinks the SWE-bench story is about contamination, not bragging rights

HN treated OpenAI's post less as benchmark housekeeping and more as an obituary for a famous coding leaderboard. The thread cared far more about flawed tests and contamination than about who happened to top the chart first.

#openai #swe-bench #evals