Anthropic Launches Claude Opus 4.6, Outperforms GPT-5.2
Key Features
Anthropic released Claude Opus 4.6 on February 5, introducing adaptive thinking, a 1M token context window in beta, and 128K max output tokens. This release marks the highest agentic coding scores Anthropic has achieved to date.
Benchmark Results
Coding Performance: On Terminal Bench, Opus 4.6 scores 65.4%, up from 59.8% for Opus 4.5, and on the OSWorld agentic computer use benchmark, its score rose from 66.3% to 72.7%.
Long-Context Retrieval: Claude Opus 4.6 scored 76% on a long-context retrieval benchmark where its predecessor managed just 18.5% — a more than 4x improvement.
Knowledge Work: On GDPval-AA, evaluating performance on economically valuable knowledge work tasks in finance, legal, and other domains, Opus 4.6 outperforms OpenAI's GPT-5.2 by around 144 Elo points and its own predecessor Claude Opus 4.5 by 190 points.
Additional Achievements
It achieves the highest score on the agentic coding evaluation Terminal-Bench 2.0 and leads all other frontier models on Humanity's Last Exam, a complex multidisciplinary reasoning test.
Industry Impact
The launch of Opus 4.6 signals a new phase in AI model competition. Its superiority in knowledge work and coding agent performance — critical for enterprise environments — is expected to strengthen Anthropic's position in the enterprise AI market.
Source: Anthropic
Related Articles
Anthropic is not only shipping a stronger Claude model; it is splitting the same base capability into a broad Fable release and a restricted Mythos track. The package includes $10/$50 token pricing, 30-day safety retention, and automatic fallback to Opus 4.8 for some high-risk requests.
Claude Fable 5 has moved to the top of Artificial Analysis’s GDPval-AA benchmark with a 1932 score. The result puts Anthropic models in three of the top four slots and raises the bar for long-running agentic knowledge work.
For months, Claude has been spontaneously telling users to go to sleep during active conversations, sometimes at 8:30 AM. Anthropic acknowledges the issue but hasn't identified the root cause, calling it 'a bit of a character tic.'