Cursor details Composer 2’s training stack, from continued pretraining to real-world RL

What Cursor posted on X

On March 26, 2026, Cursor said it had published more research on Composer 2 and claimed that real-time reinforcement learning lets it ship improved checkpoints every five hours. That is a strong statement because most public model announcements focus on benchmark snapshots or periodic model releases. Cursor was emphasizing a training loop that stays close to deployment cadence.

What the technical report adds

Cursor’s technical report, published March 27, says Composer 2 is trained in two phases: continued pretraining on Kimi K2.5 with a code-heavy data mix, then large-scale RL in realistic Cursor sessions using the same tools and harness as the deployed product. The company argues that lower pretraining loss translates into better downstream agent performance, while RL improves both average and best-of-K outcomes.

The report also explains why CursorBench exists. Cursor says public coding benchmarks often over-specify tasks and underrepresent the messy, ambiguous work developers actually hand to coding agents. CursorBench is built from real engineering sessions, with terse prompts and multi-file solutions, and Composer 2 scores 61.3 on that suite, which the report describes as a 37% improvement over Composer 1.5. The same report lists 73.7 on SWE-bench Multilingual and 61.7 on Terminal-Bench.

Why it matters

The interesting part is not just raw benchmark movement. Cursor is making a case that the winning recipe for coding agents is tight feedback from production-like environments, not only more pretraining tokens. Its infrastructure section describes an asynchronous multi-region RL pipeline, custom low-precision kernels for MoE training on Blackwell GPUs, and Anyrun, an internal platform for hundreds of thousands of sandboxed coding environments. That is the stack you build when your product depends on fast iteration over agent behavior, not occasional model refreshes.

If Cursor can really compress update cycles to hours, the competitive battleground moves from who launches the biggest checkpoint to who can safely learn from real workflows fastest. That has implications well beyond Cursor itself, because it points toward a future where coding models are tuned continuously around tool use, environment fidelity, and evaluation suites built from live developer work.

Sources: Cursor on X, Cursor technical report.

Cursor details Composer 2’s training stack, from continued pretraining to real-world RL

What Cursor posted on X

What the technical report adds

Why it matters

Related Articles

Cursor puts GPT-5.5 atop CursorBench at 72.8% and halves price

Ornith-1.0 tests the open-model bar for agentic coding

OpenAI says 30% of SWE-Bench Pro is broken and drops its recommendation

Related Articles

Cursor puts GPT-5.5 atop CursorBench at 72.8% and halves price
LLM Apr 26, 2026 2 min read

Ornith-1.0 tests the open-model bar for agentic coding
LLM Hacker News Jun 30, 2026 1 min read

OpenAI says 30% of SWE-Bench Pro is broken and drops its recommendation
LLM X/Twitter Jul 10, 2026 2 min read