r/singularity Tracks Symbolica’s 36.08% ARC-AGI-3 Result and Its Cost Advantage
Original: From 0% to 36% on Day 1 of ARC-AGI-3 View original →
What Symbolica reported
A March 2026 r/singularity post brought fresh attention to Symbolica’s ARC-AGI-3 results, reaching 203 points and 82 comments at crawl time. According to Symbolica’s write-up, its Agentica SDK achieved an unverified 36.08% on the ARC-AGI-3 public eval set, solving 113 of 182 playable levels and fully completing 7 of the 25 available games.
The company’s framing is important. This is not presented as a pure chain-of-thought benchmark run. It is an agentic system result, where the SDK is sandboxed and allowed to run persistent tasks, including puzzle-solving loops. That distinction helps explain why the community paid attention: ARC-style evaluations are increasingly being treated as tests of structured reasoning and interaction, not just next-token fluency.
The gap to reported baselines
Symbolica also emphasized cost efficiency. In its published comparison, Agentica’s 36.08% result is paired with an estimated cost of $1,005, versus 0.25% for Opus 4.6 Max at $8,900 and 0.3% for GPT 5.4 High. Those numbers should be read carefully because the result is explicitly unverified, but they still explain the reaction. The story is not just higher score. It is a materially different score-cost tradeoff under an agent loop.
Where the system did best
The breakdown in the post shows especially strong performance on several games. Symbolica lists scores of 97.60 on CN04, 84.16 on LP85, 83.28 on AR25, and 77.59 on FT09. Other tasks fall off sharply, which is equally informative. The pattern suggests that the current agent stack is not uniformly strong across the benchmark, but it can already dominate specific puzzle families well enough to change the conversation.
For AI readers, the significance is not that ARC-AGI-3 is solved. It clearly is not. The more useful takeaway is that tool-using, persistent agent systems may now be the more interesting variable than chain-of-thought prompting alone. r/singularity reacted to this as a sign that benchmark progress is moving from passive reasoning toward active orchestration. If future independent verification lands near the same range, this result will look like an early marker of that shift rather than an isolated leaderboard anomaly.
Primary source: Symbolica’s ARC-AGI-3 post. Community discussion: r/singularity.
Related Articles
ARC Prize says ARC-AGI-3 is an interactive reasoning benchmark that measures planning, memory compression, and belief updating inside novel environments rather than static puzzle answers. Hacker News pushed the launch because it gives agent builders a more behavior-first way to compare systems against humans.
ARC Prize introduced ARC-AGI-3 on March 24, 2026 as a benchmark for frontier agentic intelligence in novel environments. On Hacker News it reached 238 points and 163 comments, signaling strong interest in evaluation methods that go beyond static tasks.
Mozilla.ai's cq drew HN attention by proposing a local-first, reviewable knowledge commons that lets coding agents query and contribute narrow task-specific lessons instead of relying only on static repo instructions.
Comments (0)
No comments yet. Be the first to comment!