Mathematicians Challenge AI: Show Us Your Proof Work
Original: Mathematicians Issue a Major Challenge to AI—Show Us Your Work View original →
The Challenge
Leading mathematicians have launched an unprecedented exam called "First Proof" to test whether artificial intelligence can solve genuine, previously unsolved mathematical problems. The initiative addresses growing concerns about unverified claims from AI companies regarding mathematical breakthroughs.
Why This Matters
The mathematical community has become skeptical of recent AI achievements. Andrew Sutherland from MIT noted: "These are brand-new problems that cannot be found in any LLM's training data." This ensures AI cannot simply retrieve existing solutions from its training materials.
Past AI accomplishments raised red flags. One startup's celebrated proof turned out to be a misrepresented literature search result. Additionally, most published papers on AI mathematics come from the companies producing the AI systems themselves, creating an appearance of self-promotion rather than independent verification.
The Test Structure
Eleven mathematical experts, including a Fields Medal winner, contributed unsolved problems from their research. The exam focuses on "lemmas"—small theorems that mathematicians prove while working toward larger results—which represent more realistic applications of AI in daily mathematical work.
Crucially, encrypted proofs were submitted beforehand, with decryption scheduled for February 13, ensuring answers cannot be fabricated after the fact. The participating AI systems have one week to solve these problems.
Future Potential
Rather than solving landmark open problems, mathematicians view AI's near-term value as accelerating tedious research components, potentially making mathematical investigation more efficient across the field.
Related Articles
Google DeepMind's Aletheia AI research agent solved 6 out of 10 open research-level math problems in the FirstProof Challenge as judged by expert mathematicians. The system also generated a fully autonomous research paper and solved 4 open conjectures from Bloom's Erdős database.
Anthropic said Claude Opus 4.6 found 22 Firefox vulnerabilities during a two-week collaboration with Mozilla. Mozilla classified 14 as high severity and shipped fixes in Firefox 148.0.
A new paper discussed in r/MachineLearning argues that unofficial model-access providers can quietly substitute models and distort both research and production results.
Comments (0)
No comments yet. Be the first to comment!