Google DeepMind's Aletheia Autonomously Solves 6 Research-Level Math Problems
Original: Google DeepMind's "Aletheia" just solved 6 open research-level math problems. Is this the AGI moment we've been waiting for? View original →
Beyond Math Competitions
Google DeepMind's Aletheia AI agent is demonstrating the ability to tackle genuine open problems in mathematics research — not just competition problems. A Reddit post in r/singularity (score: 291) highlighting this achievement sparked significant discussion about whether AI is approaching genuine mathematical research capability.
Key Achievements
- FirstProof Challenge: Aletheia autonomously solved 6 out of 10 open research-level math problems according to majority expert assessment
- Bloom's Erdős Conjectures: In a semi-autonomous evaluation of 700 open problems, Aletheia solved 4 open questions
- Autonomous research paper: Generated a fully AI-authored paper calculating eigenweight structure constants in arithmetic geometry
How Aletheia Works
Aletheia is built on Gemini Deep Think and uses a three-part agentic harness: a Generator that proposes candidate solutions, a Verifier that checks for flaws, and a Reviser that corrects errors. This architecture improves with more inference-time compute — Gemini Deep Think now scores up to 90% on IMO-ProofBench Advanced, up from IMO Gold-medal level in July 2025.
Mathematical Community Recognition
Fields Medalist Terence Tao and other leading mathematicians have recognized the significance of these results, describing Aletheia as a 'valuable research collaborator.' While Aletheia still struggles with many problems, the successes represent a qualitative leap in AI-assisted research.
Related Articles
Google DeepMind said on March 17, 2026 that it has published a new cognitive-science framework for evaluating progress toward AGI and launched a Kaggle hackathon to turn that framework into practical benchmarks. The proposal defines 10 cognitive abilities, recommends comparison against human baselines, and puts $200,000 behind community-built evaluations.
Google DeepMind says a Sierra Leone classroom trial shifted Gemini use toward learning behavior: queries about how to tackle problems rose from 68% to 90%. The eight-week RCT covered 1,763 students across 12 schools.
A 2024 paper claiming to prove — via complexity theory — that machine learning cannot achieve human-level performance has been shown to be irreparably broken. The rebuttal paper is now published in the same journal, Computational Brain & Behavior.