Demis Hassabis Proposes Definitive AGI Test: Could AI Discover General Relativity?
Original: Demis Hassabis: "The kind of test I would be looking for is training an AI system with a knowledge cutoff of, say, 1911, and then seeing if it could come up with general relativity, like Einstein did in 1915. That's the kind of test I think is a true test of whether we have a full AGI system" View original →
The Einstein Test: A Concrete Benchmark for AGI
DeepMind CEO Demis Hassabis has proposed a compelling and specific test for determining whether a true AGI has been achieved, sparking intense discussion across the AI research community.
In a YouTube interview, Hassabis described his vision: "The kind of test I would be looking for is training an AI system with a knowledge cutoff of, say, 1911, and then seeing if it could come up with general relativity, like Einstein did in 1915. That's the kind of test I think is a true test of whether we have a full AGI system."
Why This Test Matters
The power of this proposal lies in what it measures: not memorization or pattern recognition, but genuine scientific reasoning and creative discovery. General relativity required Einstein to synthesize existing mathematical tools and physical observations into an entirely new conceptual framework — something that goes far beyond recombining known information.
- Physics available by 1911: Newtonian mechanics, special relativity (1905), electromagnetism
- Einstein's 1915 achievement: Unifying gravity with spacetime curvature via the equivalence principle
- Required capability: Paradigm-breaking conceptual innovation
The Gap Between Current LLMs and AGI
While today's large language models excel at synthesizing and explaining existing concepts, their ability to independently construct fundamentally new physical theories remains unproven. Hassabis's test crystallizes this distinction sharply.
The comment earned over 2,800 upvotes on r/singularity, catalyzing deeper discussion about what the ultimate goal of AI research really is — and how far current systems remain from achieving it.
Competing Definitions of AGI
Hassabis's proposal also highlights the diversity of AGI definitions. While OpenAI defines AGI as a system capable of performing "most economically valuable tasks," Hassabis sets a far more rigorous bar: the ability to make genuine scientific discoveries. This distinction matters enormously for how we measure and evaluate progress in AI development.
Related Articles
A 2024 paper claiming to prove — via complexity theory — that machine learning cannot achieve human-level performance has been shown to be irreparably broken. The rebuttal paper is now published in the same journal, Computational Brain & Behavior.
arXiv has begun enforcing a one-year submission ban on authors whose papers contain incontrovertible evidence of unchecked LLM-generated errors such as hallucinated references. The policy marks a firm institutional stance on AI-assisted academic dishonesty.
A counterintuitive study found that programming AI agents with more assertive, 'rude' conversational behaviors — including interrupting and strategic silence — significantly improved their performance on complex reasoning tasks.