Mathematicians Challenge AI: Show Us Your Proof Work
Original: Mathematicians Issue a Major Challenge to AI—Show Us Your Work View original →
The Challenge
Leading mathematicians have launched an unprecedented exam called "First Proof" to test whether artificial intelligence can solve genuine, previously unsolved mathematical problems. The initiative addresses growing concerns about unverified claims from AI companies regarding mathematical breakthroughs.
Why This Matters
The mathematical community has become skeptical of recent AI achievements. Andrew Sutherland from MIT noted: "These are brand-new problems that cannot be found in any LLM's training data." This ensures AI cannot simply retrieve existing solutions from its training materials.
Past AI accomplishments raised red flags. One startup's celebrated proof turned out to be a misrepresented literature search result. Additionally, most published papers on AI mathematics come from the companies producing the AI systems themselves, creating an appearance of self-promotion rather than independent verification.
The Test Structure
Eleven mathematical experts, including a Fields Medal winner, contributed unsolved problems from their research. The exam focuses on "lemmas"—small theorems that mathematicians prove while working toward larger results—which represent more realistic applications of AI in daily mathematical work.
Crucially, encrypted proofs were submitted beforehand, with decryption scheduled for February 13, ensuring answers cannot be fabricated after the fact. The participating AI systems have one week to solve these problems.
Future Potential
Rather than solving landmark open problems, mathematicians view AI's near-term value as accelerating tedious research components, potentially making mathematical investigation more efficient across the field.
Related Articles
OpenAI의 범용 추론 모델이 에르되시의 평면 단위거리 문제 추측 상한을 반증하는 반례를 찾아냈다고 발표했습니다. 수학자들이 증명을 검토했지만 ML 커뮤니티는 방법론 투명성에 의문을 제기합니다.
주요 수학자들이 AI의 수학 능력을 객관적으로 평가하기 위해 '퍼스트 프루프(First Proof)' 시험을 만들었습니다. 이는 AI 기업의 불투명한 주장에 대응하기 위한 것으로, 훈련 데이터에 없는 새로운 미해결 문제들로 구성되어 있습니다.
Google DeepMind의 Sierra Leone 교실 실험에서 학생들의 문제 접근 질문 비중이 68%에서 90%로 늘었다. 8주간 1,763명을 대상으로 한 RCT라는 점에서 교육용 AI 논의가 사용감이 아니라 행동 변화 지표로 이동했다.