GPT-5.5 and Claude Opus 4.7 Both Score Under 1% on ARC-AGI-3

Original: ARC-AGI-3 Update (GPT-5.5 High and Opus4.7) View original →

Read in other languages: 한국어日本語
AI May 2, 2026 By Insights AI (Reddit) 1 min read Source

ARC-AGI-3 Is No Joke

Even the most capable frontier models barely register on ARC-AGI-3. The latest community-shared results put GPT-5.5 High at 0.43% and Claude Opus 4.7 at 0.18% — effectively near-zero performance on a benchmark that most humans handle with ease.

What Is ARC-AGI-3?

The Abstraction and Reasoning Corpus (ARC-AGI) tests abstract pattern recognition and reasoning — tasks trivial for humans but apparently very hard for LLMs. Version 3 is significantly harder than its predecessors. As one r/singularity commenter put it: "If AI can't play games a 3-year-old could play, something is wrong with current models."

A Surprising Regression

Notably, Claude Opus 4.7 scored lower than Opus 4.6 on this benchmark, reigniting debate about whether newer models always improve across all dimensions. This suggests current training approaches may not be advancing genuine abstract reasoning, even as they improve on many other metrics.

The Road to 80%

The community is asking: how many months until a model cracks 80%? ARC-AGI-3 is increasingly being treated as a meaningful signal for true AGI progress — and the current results suggest that signal is still far from lighting up.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment