GPT-5.5 and Claude Opus 4.7 Both Score Under 1% on ARC-AGI-3

ARC-AGI-3 Is No Joke

Even the most capable frontier models barely register on ARC-AGI-3. The latest community-shared results put GPT-5.5 High at 0.43% and Claude Opus 4.7 at 0.18% — effectively near-zero performance on a benchmark that most humans handle with ease.

What Is ARC-AGI-3?

The Abstraction and Reasoning Corpus (ARC-AGI) tests abstract pattern recognition and reasoning — tasks trivial for humans but apparently very hard for LLMs. Version 3 is significantly harder than its predecessors. As one r/singularity commenter put it: "If AI can't play games a 3-year-old could play, something is wrong with current models."

A Surprising Regression

Notably, Claude Opus 4.7 scored lower than Opus 4.6 on this benchmark, reigniting debate about whether newer models always improve across all dimensions. This suggests current training approaches may not be advancing genuine abstract reasoning, even as they improve on many other metrics.

The Road to 80%

The community is asking: how many months until a model cracks 80%? ARC-AGI-3 is increasingly being treated as a meaningful signal for true AGI progress — and the current results suggest that signal is still far from lighting up.

AI Reddit 1h ago 1 min read

GPT-5.5 and Claude Opus 4.7 Both Score Under 1% on ARC-AGI-3

The latest ARC-AGI-3 benchmark results reveal GPT-5.5 scoring 0.43% and Claude Opus 4.7 at just 0.18%, underscoring the extreme difficulty of this next-generation AGI evaluation.

#benchmark #agi #gpt-5.5

AI X/Twitter 6d ago 2 min read

Claude agents gain memory across sessions in public beta

Why it matters: persistent memory is one of the missing pieces between demo agents and useful long-running agents. Anthropic pushed the feature into public beta on April 23 and framed it as a memory layer that learns from every session.

#anthropic #claude #agents

AI X/Twitter 6d ago 1 min read

Anthropic’s 69-person market test found stronger agents win quietly

Anthropic’s new agent-market experiment matters because it turns model quality into money. In a 69-person office marketplace, Claude agents closed 186 deals worth just over $4,000, and Opus-backed users got better prices without noticing.

#anthropic #claude #agents

GPT-5.5 and Claude Opus 4.7 Both Score Under 1% on ARC-AGI-3

ARC-AGI-3 Is No Joke

What Is ARC-AGI-3?

A Surprising Regression

The Road to 80%

Related Articles

GPT-5.5 and Claude Opus 4.7 Both Score Under 1% on ARC-AGI-3

Claude agents gain memory across sessions in public beta

Anthropic’s 69-person market test found stronger agents win quietly

Comments (0)

Leave a Comment