Cursor publishes the Composer 2 technical report detailing continued pretraining and large-scale RL for coding agents

Original: Cursor publishes the Composer 2 technical report detailing continued pretraining and large-scale RL for coding agents View original →

Read in other languages: 한국어日本語
LLM Mar 30, 2026 By Insights AI (Twitter) 2 min read 1 views Source
Cursor publishes the Composer 2 technical report detailing continued pretraining and large-scale RL for coding agents

Overview

Cursor used X on March 24, 2026 to announce a technical report for Composer 2, its latest model for agentic software engineering. In a follow-up reply, the company linked the full PDF report, which turns the post from a simple launch note into one of the clearest first-party disclosures yet on how a production coding model is trained and evaluated.

According to the report, Composer 2 is a domain-specialized model built for long-horizon coding tasks in the same harness and tool environment used by deployed Cursor agents. Cursor says the goal was to minimize train-test mismatch by training against workflows that resemble real user sessions rather than narrow benchmark prompts.

Training recipe and architecture

Cursor says Composer 2 was trained in two stages. First came continued pretraining on a code-dominated data mix to improve knowledge and latent coding ability. The report says this stage used Kimi K2.5, a 1.04 trillion parameter mixture-of-experts model with 32 billion active parameters, as the base model. After 32k-token training and a long-context extension to 256k tokens, Cursor added a short supervised fine-tuning phase on targeted coding tasks.

The second stage was large-scale reinforcement learning in environments designed to mirror real Cursor sessions. The company says it trained against tasks spanning debugging, new features, refactors, documentation, testing, code review, DevOps, and migrations. The report also describes self-summarization for long trajectories, multi-token prediction for faster serving, and reward shaping to balance speed, tool use, and code quality.

Benchmark results

On evaluation, Cursor reports 61.3% on CursorBench, 73.7% on SWE-bench Multilingual, and 61.7% on Terminal-Bench in its harness. It frames the result as frontier-level coding performance at lower serving cost than state-of-the-art model API pricing. The most interesting claim is not just the benchmark table, but the methodology: CursorBench is built from real internal engineering sessions, with larger code changes and shorter, less-specified prompts than public coding benchmarks.

That makes the report worth watching beyond Cursor itself. As coding agents move from autocomplete into longer autonomous workflows, first-party transparency about training environments, reward design, and benchmark construction is becoming strategically important. Primary source: Composer 2 Technical Report.

Share: Long

Related Articles

LLM sources.twitter 1d ago 2 min read

Cursor said on March 26, 2026 that real-time reinforcement learning lets it ship improved Composer checkpoints as often as every five hours. Cursor's research post says the loop trains on billions of production tokens from real user interactions, runs evals including CursorBench before deployment, and has already shown gains in edit persistence, dissatisfied follow-ups, and latency.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.