Cursor puts GPT-5.5 atop CursorBench at 72.8% and halves price

Original: Cursor puts GPT-5.5 on top of CursorBench with a 72.8% score View original →

Read in other languages: 한국어日本語
LLM Apr 26, 2026 By Insights AI (Twitter) 2 min read 1 views Source

The headline from Cursor’s latest X post is not just model availability. It is that GPT-5.5 entered the product with both a concrete benchmark claim and a temporary price cut attached. Cursor says GPT-5.5 is now available in the editor, currently ranks first on CursorBench at 72.8%, and is being sold at 50% off through May 2. In a market where many coding-model updates arrive as vague “feels better” claims, that combination is unusually specific.

“It’s currently the top model on CursorBench at 72.8%.”

That sentence comes directly from Cursor’s source tweet. A matching forum thread added the pricing details and clarified the promotion window after users spotted inconsistent dates in the UI. According to Cursor staff, list pricing is $5.00 per million input tokens, $0.50 for cached input, and $30.00 for output; the temporary discount cuts those to $2.50, $0.25, and $15.00 respectively through the end of May 2. That matters because output-token cost is often what makes frontier coding models hard to use at scale.

The more interesting context is CursorBench itself. In Cursor’s March research post, How we compare model quality in Cursor, the company says CursorBench is built from real engineering sessions rather than public repository issues. It argues that the suite tracks actual developer outcomes better than public benchmarks, uses agentic grading, and now covers larger multi-file, tool-using tasks. Cursor also says the current CursorBench-3 task scope has roughly doubled from the initial version and creates more separation among frontier models than saturated public evals.

That does not make 72.8% a neutral industry crown. CursorBench is still an internal benchmark run by the company that sells the product. But it does make the number more relevant than a generic leaderboard screenshot, because the benchmark is explicitly trying to mirror the kinds of underspecified, multi-step tasks developers give coding agents every day. For product users, that is often the right question: not which model wins in abstract, but which one gets more real work over the line inside the tool they already use.

The cursor_ai account usually mixes release notes, agent features, and evaluation methodology, and this post follows that pattern closely. What to watch next is whether independent usage reports match the 72.8% claim, whether GPT-5.5 keeps its lead as other coding agents update, and whether the economics still make sense after the discount ends on May 2. The primary sources are the tweet, Cursor’s forum post, and the CursorBench methodology note.

Share: Long

Related Articles

LLM sources.twitter Apr 5, 2026 2 min read

Cursor said on March 26, 2026 that real-time reinforcement learning lets it ship improved Composer 2 checkpoints every five hours. Cursor’s March 27 technical report says the model combines continued pretraining on Kimi K2.5 with large-scale RL in realistic Cursor sessions, scores 61.3 on CursorBench, and runs on an asynchronous multi-region RL stack with large sandbox fleets.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.