ARC-AGI-3 Benchmarks: GPT-5.5 at 0.43%, Claude Opus 4.7 at 0.18%
Original: ARC-AGI-3 Update (GPT-5.5 High and Opus4.7) View original →
The Numbers
An r/singularity update (354 points) reports the latest ARC-AGI-3 results: GPT-5.5 High 0.43%, Claude Opus 4.7 0.18%.
What Is ARC-AGI-3?
ARC-AGI-3 is the third ARC Prize benchmark, significantly harder than ARC-AGI-2. It tests genuine reasoning that humans perform easily but current AI models struggle with.
Why It Matters
The most capable models ever built are functionally at zero on a test any person would pass. ARC-AGI-3 remains one of the clearest indicators of the gap between today AI and genuine general intelligence.
Related Articles
The latest ARC-AGI-3 scores show GPT-5.5 High at 0.43% and Claude Opus 4.7 at 0.18% — the most powerful models today remain effectively at zero on this AGI benchmark.
HN did not just react to a leaderboard bump. The thread locked onto Dirac's claim that tighter context, hash-anchored edits, and AST-guided retrieval can beat heavier coding agents while spending less.
Why it matters: AI agents are moving from chat demos into delegated economic work. In Anthropic’s office-market experiment, 69 agents closed 186 deals across more than 500 listings and moved a little over $4,000 in goods.
Comments (0)
No comments yet. Be the first to comment!