Google DeepMind Proposes a Cognitive Framework for Measuring AGI Progress

Original: Measuring progress toward AGI: A cognitive framework View original →

Read in other languages: 한국어日本語
LLM Mar 24, 2026 By Insights AI 2 min read Source

Google DeepMind is trying to turn AGI debate into something more measurable. In a March 17, 2026 paper and companion blog post, the company introduced what it calls a cognitive framework for tracking progress toward artificial general intelligence. Instead of asking whether one model has crossed an abstract threshold, DeepMind argues that researchers need a structured way to evaluate distinct cognitive abilities and compare system performance with human baselines.

The proposed taxonomy draws from psychology, neuroscience, and cognitive science. DeepMind identifies 10 capabilities that it says are likely to matter for general intelligence in AI systems: perception, generation, attention, learning, memory, reasoning, metacognition, executive functions, problem solving, and social cognition. The point is not just to create another list of benchmarks, but to give evaluators a clearer vocabulary for what a model can and cannot do.

The evaluation protocol has three stages

  • Test AI systems across a broad suite of cognitive tasks with held-out data to reduce contamination risk.
  • Collect human baselines for the same tasks from a demographically representative adult sample.
  • Map each system's performance against the distribution of human performance for each ability.

DeepMind is also trying to push the framework into practice instead of leaving it as theory. Alongside the paper, it launched a Kaggle hackathon focused on five abilities where it sees the biggest evaluation gaps: learning, metacognition, attention, executive functions, and social cognition. Participants can build evaluations on Kaggle's Community Benchmarks platform, and the total prize pool is $200,000. Submissions are open from March 17 through April 16, with results scheduled for June 1.

There is an obvious strategic angle here. Frontier AI labs increasingly talk about AGI, but the term remains elastic and politically charged. By grounding the discussion in cognitive science and relative human performance, DeepMind is proposing a more defensible framework for future claims. That does not solve every measurement problem, but it does move the conversation away from vague rhetoric and toward reproducible evaluation design.

Whether the framework becomes widely adopted will depend on how useful the resulting tasks are and whether other labs treat the taxonomy as neutral enough to share. Still, the release is significant because it combines a conceptual model, a concrete protocol, and an incentive for outside researchers to build the missing benchmarks. In the current AI race, better measurement may end up being almost as important as better models. Source: Google DeepMind.

Share: Long

Related Articles

LLM Hacker News 1d ago 2 min read

A Hacker News thread around Skylar Payne's DSPy post argues that teams often rebuild DSPy-style LLM engineering patterns as systems mature, even though unfamiliar abstractions, Python fit, and eval design still slow direct adoption.

LLM Reddit 2d ago 2 min read

A rerun benchmark posted to r/LocalLLaMA argues that Apple’s M5 Max shows its clearest gains on prompt processing rather than raw generation alone. The post reports 2,845 tok/s PP512 for Qwen 3.5 35B-A3B MoE and 92.2 tok/s generation, but these remain community measurements rather than independent lab benchmarks.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.