Google DeepMind proposes a cognitive framework for measuring AGI progress
Original: Measuring progress toward AGI: A cognitive framework View original →
Google DeepMind said on March 17, 2026 that it has published a new paper on how to measure progress toward AGI, arguing that current discussions often lack a durable empirical framework. Instead of claiming that AGI is near or that any one benchmark can settle the question, the paper proposes a cognitive-science approach for describing and comparing the capabilities of AI systems. DeepMind positions the work as an attempt to improve the measurement layer that sits between frontier-model marketing claims and any serious assessment of general intelligence.
The paper identifies 10 cognitive abilities that DeepMind argues are important for general intelligence: perception, generation, attention, learning, memory, reasoning, metacognition, executive functions, problem solving, and social cognition. It then proposes a three-stage evaluation protocol. First, researchers should test AI systems across a broad suite of tasks that cover each ability and use held-out sets to limit contamination. Second, they should gather human baselines for the same tasks from a demographically representative sample of adults. Third, they should map model performance against the distribution of human performance rather than treating raw scores in isolation.
To move the idea from theory to practice, DeepMind and Kaggle also launched a hackathon focused on five areas where the evaluation gap is largest: learning, metacognition, attention, executive functions, and social cognition. Participants can build benchmarks on Kaggle's Community Benchmarks platform and test them against a lineup of frontier models. Google says the competition carries a total prize pool of $200,000, with submissions open from March 17 through April 16 and results scheduled for June 1.
Why it matters
- Benchmark design increasingly shapes how labs, investors, and regulators interpret frontier-model progress.
- DeepMind is pushing for human-relative measurement rather than single-score leaderboard thinking.
- The Kaggle hackathon turns an abstract framework into a community effort to build reusable evaluations.
The announcement does not say AGI has been reached. Instead, it shows one major lab trying to standardize how progress claims should be evaluated before they harden into industry narrative. If the framework gains adoption, it could influence how future model releases are compared, how capability gaps are discussed, and how public arguments about AGI become more evidence-driven.
Related Articles
Google DeepMind said on X that it is launching a Kaggle hackathon with $200,000 in prizes to build new cognitive evaluations for AI. The linked Google post says the effort is part of a broader framework for measuring AGI progress across 10 cognitive abilities rather than a single benchmark.
Google DeepMind's Aletheia AI research agent solved 6 out of 10 open research-level math problems in the FirstProof Challenge as judged by expert mathematicians. The system also generated a fully autonomous research paper and solved 4 open conjectures from Bloom's Erdős database.
A new paper discussed in r/MachineLearning argues that unofficial model-access providers can quietly substitute models and distort both research and production results.
Comments (0)
No comments yet. Be the first to comment!