Demis Hassabis: "The True AGI Test Is an AI Deriving General Relativity on Its Own"
Original: Demis Hassabis: "The kind of test I would be looking for is training an AI system with a knowledge cutoff of, say, 1911, and then seeing if it could come up with general relativity, like Einstein did in 1915. That's the kind of test I think is a true test of whether we have a full AGI system" View original →
Hassabis's AGI Test
Google DeepMind CEO Demis Hassabis has proposed a striking benchmark for determining whether we've achieved true artificial general intelligence. His remarks garnered over 1,800 upvotes on Reddit's r/singularity community.
The Einstein Test
Hassabis described his idea as follows:
"The kind of test I would be looking for is training an AI system with a knowledge cutoff of, say, 1911, and then seeing if it could come up with general relativity, like Einstein did in 1915. That's the kind of test I think is a true test of whether we have a full AGI system."
Why This Test Is Meaningful
This benchmark goes well beyond pattern recognition or memorization. Einstein's general relativity wasn't derived by analyzing data — it required synthesizing disparate physical principles in a fundamentally new way, driven by deep intuition and novel reasoning.
By Hassabis's standard, today's LLMs would fail this test. Current AI systems excel at synthesizing and summarizing existing knowledge but have not demonstrated the ability to discover genuinely new physical principles from first principles.
Implications for AGI Research
The statement offers a window into how Hassabis thinks about the goal of AGI at DeepMind. Rather than measuring task performance, he envisions AGI as a system capable of expanding the boundaries of human knowledge — not just operating within them.
This framing places the bar significantly higher than many current AGI benchmarks, which often focus on whether AI can perform human-level tasks rather than whether it can make Einstein-level scientific discoveries.
Related Articles
DeepMind CEO Demis Hassabis proposed a concrete AGI benchmark: train an AI with a knowledge cutoff of 1911, then see if it can independently derive general relativity as Einstein did in 1915. This test targets genuine scientific discovery rather than pattern matching.
Researchers warn that AI-generated faces have become so realistic that humans can no longer reliably distinguish them from real photographs, raising serious concerns about deepfakes, disinformation, and digital trust.
Anthropic published a March 6, 2026 case study showing how Claude Opus 4.6 authored a working test exploit for Firefox vulnerability CVE-2026-2796. The company presents the result as an early warning about advancing model cyber capabilities, not as proof of reliable real-world offensive automation.
Comments (0)
No comments yet. Be the first to comment!