r/singularity Tracks Symbolica’s 36.08% ARC-AGI-3 Result and Its Cost Advantage

What Symbolica reported

A March 2026 r/singularity post brought fresh attention to Symbolica’s ARC-AGI-3 results, reaching 203 points and 82 comments at crawl time. According to Symbolica’s write-up, its Agentica SDK achieved an unverified 36.08% on the ARC-AGI-3 public eval set, solving 113 of 182 playable levels and fully completing 7 of the 25 available games.

The company’s framing is important. This is not presented as a pure chain-of-thought benchmark run. It is an agentic system result, where the SDK is sandboxed and allowed to run persistent tasks, including puzzle-solving loops. That distinction helps explain why the community paid attention: ARC-style evaluations are increasingly being treated as tests of structured reasoning and interaction, not just next-token fluency.

The gap to reported baselines

Symbolica also emphasized cost efficiency. In its published comparison, Agentica’s 36.08% result is paired with an estimated cost of $1,005, versus 0.25% for Opus 4.6 Max at $8,900 and 0.3% for GPT 5.4 High. Those numbers should be read carefully because the result is explicitly unverified, but they still explain the reaction. The story is not just higher score. It is a materially different score-cost tradeoff under an agent loop.

Where the system did best

The breakdown in the post shows especially strong performance on several games. Symbolica lists scores of 97.60 on CN04, 84.16 on LP85, 83.28 on AR25, and 77.59 on FT09. Other tasks fall off sharply, which is equally informative. The pattern suggests that the current agent stack is not uniformly strong across the benchmark, but it can already dominate specific puzzle families well enough to change the conversation.

For AI readers, the significance is not that ARC-AGI-3 is solved. It clearly is not. The more useful takeaway is that tool-using, persistent agent systems may now be the more interesting variable than chain-of-thought prompting alone. r/singularity reacted to this as a sign that benchmark progress is moving from passive reasoning toward active orchestration. If future independent verification lands near the same range, this result will look like an early marker of that shift rather than an isolated leaderboard anomaly.

Primary source: Symbolica’s ARC-AGI-3 post. Community discussion: r/singularity.

r/singularity Tracks Symbolica’s 36.08% ARC-AGI-3 Result and Its Cost Advantage

What Symbolica reported

The gap to reported baselines

Where the system did best

Related Articles

ARC-AGI-3 resets the benchmark conversation around interactive reasoning

Hacker News spotlights ARC-AGI-3, a new agent benchmark built around interaction and adaptation

GPT-5.5 and Claude Opus 4.7 Both Score Under 1% on ARC-AGI-3

Comments (0)

Leave a Comment

Related Articles

ARC-AGI-3 resets the benchmark conversation around interactive reasoning
AI Hacker News Mar 26, 2026 2 min read

Hacker News spotlights ARC-AGI-3, a new agent benchmark built around interaction and adaptation
AI Hacker News Mar 26, 2026 2 min read

GPT-5.5 and Claude Opus 4.7 Both Score Under 1% on ARC-AGI-3
AI Reddit May 2, 2026 1 min read