Skip to content

Claude Fable 5 reaches 1932 on GDPval-AA and takes agent benchmark lead

Original: Claude Fable 5 reaches 1932 on GDPval-AA and takes agent benchmark lead View original →

Read in other languages: 한국어日本語
LLM Jun 11, 2026 By Insights AI (Twitter) 1 min read 1 views Source
Claude Fable 5 reaches 1932 on GDPval-AA and takes agent benchmark lead

A 1932 score changes the Fable 5 story

Claude Fable 5 is no longer only a broad model release story; it now has an early external benchmark anchor. Artificial Analysis wrote on X that the model "scores 1932 on GDPval-AA" and takes the No. 1 position on its agentic real-world knowledge work benchmark. The source tweet is available here.

The post matters because GDPval-AA is aimed at agent-style professional tasks, not short chat prompts. Artificial Analysis said Anthropic shared access before public release, and that the measured configuration used adaptive reasoning at maximum effort with Claude Opus 4.8 as the fallback model. It also said Fable 5 fell back to Opus 4.8 on 2% of GDPval-AA tasks, while Anthropic has described average session fallback as below 5%.

That fallback design is central to the product. Anthropic’s own Fable 5 material describes the model as a Mythos-class system made safe for general use. The company says the underlying capabilities exceed any Claude model it has previously made generally available, but some cybersecurity, biology, chemistry, and distillation-related requests are routed to Opus 4.8. Fable 5 is priced at $10 per million input tokens and $50 per million output tokens, with 30-day data retention required for safety monitoring.

Artificial Analysis usually tracks model performance with comparative scoreboards, so this tweet gives builders an early signal before the full Intelligence Index update lands. The next thing to watch is whether independent users see the same advantage in messy coding, research, and enterprise workflows. The benchmark lead is strong, but the operational question is sharper: can Fable 5 keep its long-horizon edge while its safeguards stay quiet enough for serious teams to use it every day?

Share: Long

Related Articles