Claude Fable 5 reaches 1932 on GDPval-AA and takes agent benchmark lead
Original: Claude Fable 5 reaches 1932 on GDPval-AA and takes agent benchmark lead View original →
A 1932 score changes the Fable 5 story
Claude Fable 5 is no longer only a broad model release story; it now has an early external benchmark anchor. Artificial Analysis wrote on X that the model "scores 1932 on GDPval-AA" and takes the No. 1 position on its agentic real-world knowledge work benchmark. The source tweet is available here.
The post matters because GDPval-AA is aimed at agent-style professional tasks, not short chat prompts. Artificial Analysis said Anthropic shared access before public release, and that the measured configuration used adaptive reasoning at maximum effort with Claude Opus 4.8 as the fallback model. It also said Fable 5 fell back to Opus 4.8 on 2% of GDPval-AA tasks, while Anthropic has described average session fallback as below 5%.
That fallback design is central to the product. Anthropic’s own Fable 5 material describes the model as a Mythos-class system made safe for general use. The company says the underlying capabilities exceed any Claude model it has previously made generally available, but some cybersecurity, biology, chemistry, and distillation-related requests are routed to Opus 4.8. Fable 5 is priced at $10 per million input tokens and $50 per million output tokens, with 30-day data retention required for safety monitoring.
Artificial Analysis usually tracks model performance with comparative scoreboards, so this tweet gives builders an early signal before the full Intelligence Index update lands. The next thing to watch is whether independent users see the same advantage in messy coding, research, and enterprise workflows. The benchmark lead is strong, but the operational question is sharper: can Fable 5 keep its long-horizon edge while its safeguards stay quiet enough for serious teams to use it every day?
Related Articles
Claude Opus 4.8 is showing its strongest early signal in agentic work, not only coding. Artificial Analysis says the model scored 1890 on GDPval-AA, 121 points ahead of GPT-5.5 xhigh.
HN interest centered less on “Claude finds bugs” and more on the shape of a harness security teams can adapt for their own targets.
Anthropic’s May 29 platform notes move Claude Managed Agents deeper into AWS operations. Webhooks, multiagent orchestration, and self-hosted sandboxes are now available on Claude Platform on AWS, with new IAM actions and a managed policy for self-hosted execution.