ARC-AGI-3 Benchmarks: GPT-5.5 at 0.43%, Claude Opus 4.7 at 0.18%
Original: ARC-AGI-3 Update (GPT-5.5 High and Opus4.7) View original →
The Numbers
An r/singularity update (354 points) reports the latest ARC-AGI-3 results: GPT-5.5 High 0.43%, Claude Opus 4.7 0.18%.
What Is ARC-AGI-3?
ARC-AGI-3 is the third ARC Prize benchmark, significantly harder than ARC-AGI-2. It tests genuine reasoning that humans perform easily but current AI models struggle with.
Why It Matters
The most capable models ever built are functionally at zero on a test any person would pass. ARC-AGI-3 remains one of the clearest indicators of the gap between today AI and genuine general intelligence.
Related Articles
Claude Fable 5 has moved to the top of Artificial Analysis’s GDPval-AA benchmark with a 1932 score. The result puts Anthropic models in three of the top four slots and raises the bar for long-running agentic knowledge work.
DeepSWE reframes coding-agent evaluation with 113 original tasks across 91 repositories. Its first board gives GPT-5.5 a 70.0% pass@1 score, versus 54.2% for Claude Opus 4.7.
Claude Opus 4.8 is showing its strongest early signal in agentic work, not only coding. Artificial Analysis says the model scored 1890 on GDPval-AA, 121 points ahead of GPT-5.5 xhigh.