Decaying

ARC-AGI-3 Benchmarks: GPT-5.5 at 0.43%, Claude Opus 4.7 at 0.18%

Original: ARC-AGI-3 Update (GPT-5.5 High and Opus4.7) View original →

LLM May 3, 2026 By Insights AI (Reddit) 1 min read 77 views Source

The Numbers

An r/singularity update (354 points) reports the latest ARC-AGI-3 results: GPT-5.5 High 0.43%, Claude Opus 4.7 0.18%.

What Is ARC-AGI-3?

ARC-AGI-3 is the third ARC Prize benchmark, significantly harder than ARC-AGI-2. It tests genuine reasoning that humans perform easily but current AI models struggle with.

Why It Matters

The most capable models ever built are functionally at zero on a test any person would pass. ARC-AGI-3 remains one of the clearest indicators of the gap between today AI and genuine general intelligence.

LLM Benchmark Race: Frontier Competition, May 2026 Part 2 of 4

← GPT-5.4 Pro Math Proof Method Cracks Another 60-Year-Old Erdos Conjecture 95.7% SimpleQA on a Single RTX 3090: Qwen3.6-27B with Agentic Search →

#arc-agi #benchmark #gpt-5 #claude #agi

Share: Long

LLM 1d ago 2 min read

OpenAI triples ARC-AGI-3 score by retaining agent reasoning

GPT-5.6 Sol moved from 13.3% to 38.3% on ARC-AGI-3 when OpenAI retained reasoning and used compaction in the harness. The result makes benchmark setup, not just model weights, part of the frontier-agent story.

#openai #gpt-5.6 #arc-agi

LLM 6d ago 3 min read

Claude Opus 5 puts near-Fable coding power at half the cost

The new Claude default for high-end daily work shifts the model race toward performance per dollar. Anthropic says Opus 5 approaches Claude Fable 5 on coding and knowledge work while keeping API pricing at $5/M input and $25/M output tokens.

#anthropic #claude #coding-agents

LLM Reddit Feb 22, 2026 1 min read

Claude Opus 4.6 Hits 14.5-Hour Mark on METR's Software Task Benchmark

Claude Opus 4.6 achieved a 50%-time-horizon of approximately 14.5 hours on METR's software task benchmark — beating all predictions and suggesting a doubling time of under 3 months for AI task capabilities.

#claude #anthropic #metr

117