Claude Opus 4.7 hits 70% on CursorBench while keeping Opus price

Original: Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision. View original →

Read in other languages: 한국어日本語
AI Apr 17, 2026 By Insights AI 2 min read 1 views Source

What the tweet revealed

The Claude account wrote that “Claude Opus 4.7” is its “most capable Opus model yet” and said it handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. That is a material product signal because Anthropic is positioning the new Opus less as a chatbot upgrade and more as a model for delegated, multi-step work.

The account is Anthropic’s product channel for Claude releases and availability changes, so the post is best read together with the linked company page. There, Anthropic says Opus 4.7 is available across Claude products, the API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Pricing remains the same as Opus 4.6 at $5 per million input tokens and $25 per million output tokens, which matters for teams already budgeting around Opus-class workloads.

The benchmark signal

The strongest number is from CursorBench. Anthropic says Opus 4.7 clears 70%, compared with 58% for Opus 4.6. Early partner notes also point to gains in complex software workflows, visual acuity, and cyber-related guardrails. Those claims still need independent repeat tests, but they show where Anthropic wants the model judged: coding autonomy, instruction fidelity, and verification before the model returns a result.

That framing fits the current agent market. Developer tools are no longer asking only whether a model can solve a benchmark task; they are asking whether it can hold context across a messy repo, check its own work, and avoid handing back plausible but brittle changes. Opus 4.7’s launch page spends unusual space on partner workflow results, which suggests Anthropic is competing for production agent usage rather than only leaderboard placement.

What to watch next is whether third-party coding agents reproduce the 70% versus 58% CursorBench gap, whether the same-price API leads teams to switch from 4.6 quickly, and how the cyber safeguards behave as the model enters more enterprise environments. Source: Claude X post · Anthropic launch page

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.