Claude Opus 4.7 hits 70% on CursorBench while keeping Opus price
Original: Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision. View original →
What the tweet revealed
The Claude account wrote that “Claude Opus 4.7” is its “most capable Opus model yet” and said it handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. That is a material product signal because Anthropic is positioning the new Opus less as a chatbot upgrade and more as a model for delegated, multi-step work.
The account is Anthropic’s product channel for Claude releases and availability changes, so the post is best read together with the linked company page. There, Anthropic says Opus 4.7 is available across Claude products, the API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Pricing remains the same as Opus 4.6 at $5 per million input tokens and $25 per million output tokens, which matters for teams already budgeting around Opus-class workloads.
The benchmark signal
The strongest number is from CursorBench. Anthropic says Opus 4.7 clears 70%, compared with 58% for Opus 4.6. Early partner notes also point to gains in complex software workflows, visual acuity, and cyber-related guardrails. Those claims still need independent repeat tests, but they show where Anthropic wants the model judged: coding autonomy, instruction fidelity, and verification before the model returns a result.
That framing fits the current agent market. Developer tools are no longer asking only whether a model can solve a benchmark task; they are asking whether it can hold context across a messy repo, check its own work, and avoid handing back plausible but brittle changes. Opus 4.7’s launch page spends unusual space on partner workflow results, which suggests Anthropic is competing for production agent usage rather than only leaderboard placement.
What to watch next is whether third-party coding agents reproduce the 70% versus 58% CursorBench gap, whether the same-price API leads teams to switch from 4.6 quickly, and how the cyber safeguards behave as the model enters more enterprise environments. Source: Claude X post · Anthropic launch page
Related Articles
AI-enabled attacks are shifting from setup work into post-compromise operations. Anthropic mapped 832 malicious accounts to MITRE ATT&CK and found medium-or-higher risk actors rising from 33% to 56%.
AI self-improvement is moving from speculation into measurable lab workflow data. Anthropic says Mythos Preview reached about 52x speedups on an optimization task and beat human next-step choices 64% of the time.
Anthropic's Claude Platform is now generally available on AWS, offering full Claude API feature parity with AWS IAM authentication, CloudTrail audit logging, and a single AWS invoice that retires against existing commitments.