Claude Fable 5 has moved to the top of Artificial Analysis’s GDPval-AA benchmark with a 1932 score. The result puts Anthropic models in three of the top four slots and raises the bar for long-running agentic knowledge work.
Claude Fable 5 has moved to the top of Artificial Analysis’s GDPval-AA benchmark with a 1932 score. The result puts Anthropic models in three of the top four slots and raises the bar for long-running agentic knowledge work.
Anthropic is not only shipping a stronger Claude model; it is splitting the same base capability into a broad Fable release and a restricted Mythos track. The package includes $10/$50 token pricing, 30-day safety retention, and automatic fallback to Opus 4.8 for some high-risk requests.
NMR analysis is a slow chemistry bottleneck, and Anthropic says Opus 4.7 matched or beat specialist tools on parts of a 20-compound test. Its hydrogen NMR average error was about plus or minus 0.079 ppm.
HN interest centered less on “Claude finds bugs” and more on the shape of a harness security teams can adapt for their own targets.
AI self-improvement is moving from speculation into measurable lab workflow data. Anthropic says Mythos Preview reached about 52x speedups on an optimization task and beat human next-step choices 64% of the time.
Anthropic’s May 29 platform notes move Claude Managed Agents deeper into AWS operations. Webhooks, multiagent orchestration, and self-hosted sandboxes are now available on Claude Platform on AWS, with new IAM actions and a managed policy for self-hosted execution.
The Claude story is no longer only about model quality. Anthropic says its Series H raised $65B at a $965B post-money valuation, while run-rate revenue crossed $47B earlier in May.
Claude Opus 4.8 now has a fast mode that runs the same model at roughly 2.5x speed. Claude says the mode is three times cheaper than before, shifting the cost equation for long agent sessions.
Claude Opus 4.8 is showing its strongest early signal in agentic work, not only coding. Artificial Analysis says the model scored 1890 on GDPval-AA, 121 points ahead of GPT-5.5 xhigh.
HN readers focused less on the version number and more on whether same-price upgrades, cheaper fast mode, and Claude Code dynamic workflows will show up in real agent sessions.
DeepSWE reframes coding-agent evaluation with 113 original tasks across 91 repositories. Its first board gives GPT-5.5 a 70.0% pass@1 score, versus 54.2% for Claude Opus 4.7.
Claude products now touch real tools, so the risk question is shifting from model persuasion to execution boundaries. Anthropic says users approved about 93% of Claude Code permission prompts, a number that weakens human-in-the-loop defenses.