Cursor details Composer 2’s training stack, from continued pretraining to real-world RL
Original: Earlier this week, we published our technical report on Composer 2. We're sharing additional research on how we train new checkpoints. With real-time RL, we can ship improved versions of the model every five hours. View original →
What Cursor posted on X
On March 26, 2026, Cursor said it had published more research on Composer 2 and claimed that real-time reinforcement learning lets it ship improved checkpoints every five hours. That is a strong statement because most public model announcements focus on benchmark snapshots or periodic model releases. Cursor was emphasizing a training loop that stays close to deployment cadence.
What the technical report adds
Cursor’s technical report, published March 27, says Composer 2 is trained in two phases: continued pretraining on Kimi K2.5 with a code-heavy data mix, then large-scale RL in realistic Cursor sessions using the same tools and harness as the deployed product. The company argues that lower pretraining loss translates into better downstream agent performance, while RL improves both average and best-of-K outcomes.
The report also explains why CursorBench exists. Cursor says public coding benchmarks often over-specify tasks and underrepresent the messy, ambiguous work developers actually hand to coding agents. CursorBench is built from real engineering sessions, with terse prompts and multi-file solutions, and Composer 2 scores 61.3 on that suite, which the report describes as a 37% improvement over Composer 1.5. The same report lists 73.7 on SWE-bench Multilingual and 61.7 on Terminal-Bench.
Why it matters
The interesting part is not just raw benchmark movement. Cursor is making a case that the winning recipe for coding agents is tight feedback from production-like environments, not only more pretraining tokens. Its infrastructure section describes an asynchronous multi-region RL pipeline, custom low-precision kernels for MoE training on Blackwell GPUs, and Anyrun, an internal platform for hundreds of thousands of sandboxed coding environments. That is the stack you build when your product depends on fast iteration over agent behavior, not occasional model refreshes.
If Cursor can really compress update cycles to hours, the competitive battleground moves from who launches the biggest checkpoint to who can safely learn from real workflows fastest. That has implications well beyond Cursor itself, because it points toward a future where coding models are tuned continuously around tool use, environment fidelity, and evaluation suites built from live developer work.
Sources: Cursor on X, Cursor technical report.
Related Articles
Cursor has published the Composer 2 technical report, outlining its code-focused continued pretraining, large-scale reinforcement learning pipeline, and CursorBench-led evaluation strategy. The report offers an unusually detailed first-party look at how a production coding agent is trained and measured.
Cursor said on March 26, 2026 that real-time reinforcement learning lets it ship improved Composer checkpoints as often as every five hours. Cursor's research post says the loop trains on billions of production tokens from real user interactions, runs evals including CursorBench before deployment, and has already shown gains in edit persistence, dissatisfied follow-ups, and latency.
A Hacker News post pushed ATLAS into the spotlight by framing a consumer-GPU coding agent as a serious cost challenger to hosted systems. The headline benchmark is interesting, but the repository itself makes clear that its 74.6% result is not a controlled head-to-head against Claude 4.5 Sonnet because the task counts and evaluation protocols differ.
Comments (0)
No comments yet. Be the first to comment!