Anthropic Launches Claude Opus 4.6, Outperforms GPT-5.2
Key Features
Anthropic released Claude Opus 4.6 on February 5, introducing adaptive thinking, a 1M token context window in beta, and 128K max output tokens. This release marks the highest agentic coding scores Anthropic has achieved to date.
Benchmark Results
Coding Performance: On Terminal Bench, Opus 4.6 scores 65.4%, up from 59.8% for Opus 4.5, and on the OSWorld agentic computer use benchmark, its score rose from 66.3% to 72.7%.
Long-Context Retrieval: Claude Opus 4.6 scored 76% on a long-context retrieval benchmark where its predecessor managed just 18.5% — a more than 4x improvement.
Knowledge Work: On GDPval-AA, evaluating performance on economically valuable knowledge work tasks in finance, legal, and other domains, Opus 4.6 outperforms OpenAI's GPT-5.2 by around 144 Elo points and its own predecessor Claude Opus 4.5 by 190 points.
Additional Achievements
It achieves the highest score on the agentic coding evaluation Terminal-Bench 2.0 and leads all other frontier models on Humanity's Last Exam, a complex multidisciplinary reasoning test.
Industry Impact
The launch of Opus 4.6 signals a new phase in AI model competition. Its superiority in knowledge work and coding agent performance — critical for enterprise environments — is expected to strengthen Anthropic's position in the enterprise AI market.
Source: Anthropic
Related Articles
Anthropic's Claude Sonnet 4.6, released February 17, delivers Opus 4.5-level performance at Sonnet pricing with a 1M-token context window in beta, and becomes the new default for Free and Pro users.
Hacker News focused on the ambiguity around Claude CLI reuse: even if OpenClaw now treats the path as allowed, developers still want a clearer boundary between subscription, CLI, and API usage.
Anthropic put hard numbers behind Claude’s election safeguards. Opus 4.7 and Sonnet 4.6 responded appropriately 100% and 99.8% of the time in a 600-prompt election-policy test, and triggered web search 92% and 95% of the time on U.S. midterm-related queries.
Comments (0)
No comments yet. Be the first to comment!