Demis Hassabis: "The True AGI Test Is an AI Deriving General Relativity on Its Own"
Original: Demis Hassabis: "The kind of test I would be looking for is training an AI system with a knowledge cutoff of, say, 1911, and then seeing if it could come up with general relativity, like Einstein did in 1915. That's the kind of test I think is a true test of whether we have a full AGI system" View original →
Hassabis's AGI Test
Google DeepMind CEO Demis Hassabis has proposed a striking benchmark for determining whether we've achieved true artificial general intelligence. His remarks garnered over 1,800 upvotes on Reddit's r/singularity community.
The Einstein Test
Hassabis described his idea as follows:
"The kind of test I would be looking for is training an AI system with a knowledge cutoff of, say, 1911, and then seeing if it could come up with general relativity, like Einstein did in 1915. That's the kind of test I think is a true test of whether we have a full AGI system."
Why This Test Is Meaningful
This benchmark goes well beyond pattern recognition or memorization. Einstein's general relativity wasn't derived by analyzing data — it required synthesizing disparate physical principles in a fundamentally new way, driven by deep intuition and novel reasoning.
By Hassabis's standard, today's LLMs would fail this test. Current AI systems excel at synthesizing and summarizing existing knowledge but have not demonstrated the ability to discover genuinely new physical principles from first principles.
Implications for AGI Research
The statement offers a window into how Hassabis thinks about the goal of AGI at DeepMind. Rather than measuring task performance, he envisions AGI as a system capable of expanding the boundaries of human knowledge — not just operating within them.
This framing places the bar significantly higher than many current AGI benchmarks, which often focus on whether AI can perform human-level tasks rather than whether it can make Einstein-level scientific discoveries.
Related Articles
DeepMind CEO Demis Hassabis proposed a concrete AGI benchmark: train an AI with a knowledge cutoff of 1911, then see if it can independently derive general relativity as Einstein did in 1915. This test targets genuine scientific discovery rather than pattern matching.
Google DeepMind said on March 26, 2026 that it is releasing a public toolkit to measure harmful manipulation by AI systems. The company says the work spans nine studies with more than 10,000 participants and now informs safety evaluations for models including Gemini 3 Pro.
Anthropic said on March 31, 2026 that it signed an MOU with the Australian government to collaborate on AI safety research and support Australia’s National AI Plan. Anthropic says the agreement includes work with Australia’s AI Safety Institute, Economic Index data sharing, and AUD$3 million in partnerships with Australian research institutions.
Comments (0)
No comments yet. Be the first to comment!