Cursor details Composer 2’s training stack, from continued pretraining to real-world RL
Original: Earlier this week, we published our technical report on Composer 2. We're sharing additional research on how we train new checkpoints. With real-time RL, we can ship improved versions of the model every five hours. View original →
What Cursor posted on X
On March 26, 2026, Cursor said it had published more research on Composer 2 and claimed that real-time reinforcement learning lets it ship improved checkpoints every five hours. That is a strong statement because most public model announcements focus on benchmark snapshots or periodic model releases. Cursor was emphasizing a training loop that stays close to deployment cadence.
What the technical report adds
Cursor’s technical report, published March 27, says Composer 2 is trained in two phases: continued pretraining on Kimi K2.5 with a code-heavy data mix, then large-scale RL in realistic Cursor sessions using the same tools and harness as the deployed product. The company argues that lower pretraining loss translates into better downstream agent performance, while RL improves both average and best-of-K outcomes.
The report also explains why CursorBench exists. Cursor says public coding benchmarks often over-specify tasks and underrepresent the messy, ambiguous work developers actually hand to coding agents. CursorBench is built from real engineering sessions, with terse prompts and multi-file solutions, and Composer 2 scores 61.3 on that suite, which the report describes as a 37% improvement over Composer 1.5. The same report lists 73.7 on SWE-bench Multilingual and 61.7 on Terminal-Bench.
Why it matters
The interesting part is not just raw benchmark movement. Cursor is making a case that the winning recipe for coding agents is tight feedback from production-like environments, not only more pretraining tokens. Its infrastructure section describes an asynchronous multi-region RL pipeline, custom low-precision kernels for MoE training on Blackwell GPUs, and Anyrun, an internal platform for hundreds of thousands of sandboxed coding environments. That is the stack you build when your product depends on fast iteration over agent behavior, not occasional model refreshes.
If Cursor can really compress update cycles to hours, the competitive battleground moves from who launches the biggest checkpoint to who can safely learn from real workflows fastest. That has implications well beyond Cursor itself, because it points toward a future where coding models are tuned continuously around tool use, environment fidelity, and evaluation suites built from live developer work.
Sources: Cursor on X, Cursor technical report.
Related Articles
Cursor has published the Composer 2 technical report, outlining its code-focused continued pretraining, large-scale reinforcement learning pipeline, and CursorBench-led evaluation strategy. The report offers an unusually detailed first-party look at how a production coding agent is trained and measured.
Why it matters: public coding benchmarks are getting less useful at the frontier, so a fresh product-side score can move developer attention fast. Cursor says GPT-5.5 is now its top model on CursorBench at 72.8% and is discounting usage by 50% through May 2.
The weak point in model leaderboards may be the tasks, not only the models. A new arXiv paper reports critical issues in more than 25.7% of evaluated benchmark tasks and shows ranking shifts after filtering flawed items.
Comments (0)
No comments yet. Be the first to comment!