Mathematicians Challenge AI: Show Us Your Proof Work

The Challenge

Leading mathematicians have launched an unprecedented exam called "First Proof" to test whether artificial intelligence can solve genuine, previously unsolved mathematical problems. The initiative addresses growing concerns about unverified claims from AI companies regarding mathematical breakthroughs.

Why This Matters

The mathematical community has become skeptical of recent AI achievements. Andrew Sutherland from MIT noted: "These are brand-new problems that cannot be found in any LLM's training data." This ensures AI cannot simply retrieve existing solutions from its training materials.

Past AI accomplishments raised red flags. One startup's celebrated proof turned out to be a misrepresented literature search result. Additionally, most published papers on AI mathematics come from the companies producing the AI systems themselves, creating an appearance of self-promotion rather than independent verification.

The Test Structure

Eleven mathematical experts, including a Fields Medal winner, contributed unsolved problems from their research. The exam focuses on "lemmas"—small theorems that mathematicians prove while working toward larger results—which represent more realistic applications of AI in daily mathematical work.

Crucially, encrypted proofs were submitted beforehand, with decryption scheduled for February 13, ensuring answers cannot be fabricated after the fact. The participating AI systems have one week to solve these problems.

Future Potential

Rather than solving landmark open problems, mathematicians view AI's near-term value as accelerating tedious research components, potentially making mathematical investigation more efficient across the field.

AI Reddit Mar 3, 2026 1 min read

Google DeepMind's Aletheia Autonomously Solves 6 Research-Level Math Problems

Google DeepMind's Aletheia AI research agent solved 6 out of 10 open research-level math problems in the FirstProof Challenge as judged by expert mathematicians. The system also generated a fully autonomous research paper and solved 4 open conjectures from Bloom's Erdős database.

#google-deepmind #aletheia #mathematics

102

AI Reddit May 22, 2026 1 min read

OpenAI Claims AI Model Disproved Erdős's 50-Year-Old Unit-Distance Conjecture

OpenAI says a general-purpose reasoning model found a construction disproving the conjectured upper bound in Erdős's planar unit-distance problem. Mathematicians reviewed the proof, but the ML community raises questions about methodological transparency.

#openai #mathematics #reasoning

AI X/Twitter Jun 30, 2026 1 min read

Microsoft Memora cuts agent memory context tokens by up to 98%

Microsoft Research introduced Memora as an agent memory system that separates what is stored from how it is retrieved. Its research post says Memora outperforms Mem0, RAG, and full-context inference on LoCoMo and LongMemEval while using up to 98% fewer context tokens.

#microsoft-research #agents #memory