The latest ARC-AGI-3 scores show GPT-5.5 High at 0.43% and Claude Opus 4.7 at 0.18% — the most powerful models today remain effectively at zero on this AGI benchmark.
#arc-agi
RSS FeedThe latest ARC-AGI-3 scores show GPT-5.5 High at 0.43% and Claude Opus 4.7 at 0.18% — the most powerful models today remain effectively at zero on this AGI benchmark.
The latest ARC-AGI-3 benchmark results reveal GPT-5.5 scoring 0.43% and Claude Opus 4.7 at just 0.18%, underscoring the extreme difficulty of this next-generation AGI evaluation.
The latest ARC-AGI-3 benchmark results reveal GPT-5.5 scoring 0.43% and Claude Opus 4.7 at just 0.18%, underscoring the extreme difficulty of this next-generation AGI evaluation.
A March 2026 r/singularity post with 203 points and 82 comments highlighted Symbolica’s claim that its Agentica SDK reached an unverified 36.08% on ARC-AGI-3. The headline numbers were 113 of 182 playable levels solved, 7 of 25 games completed, and a much lower reported cost than chain-of-thought baselines.
Right after ARC Prize released ARC-AGI 3, r/singularity focused on the benchmark’s shift toward interactive environments and action-efficient scoring. The core message is that frontier AI still lags badly when it must generalize, explore, and plan under tight interaction budgets.
ARC Prize says ARC-AGI-3 is an interactive reasoning benchmark that measures planning, memory compression, and belief updating inside novel environments rather than static puzzle answers. Hacker News pushed the launch because it gives agent builders a more behavior-first way to compare systems against humans.
ARC Prize introduced ARC-AGI-3 on March 24, 2026 as a benchmark for frontier agentic intelligence in novel environments. On Hacker News it reached 238 points and 163 comments, signaling strong interest in evaluation methods that go beyond static tasks.