Cursor, Composer 2 technical report 공개하며 coding agent 학습 경로와 benchmark 수치 설명

개요

Cursor는 2026년 3월 24일 X를 통해 Composer 2 technical report 공개를 알렸다. 직후 이어진 후속 reply에는 전체 PDF 링크도 붙었는데, 덕분에 이번 발표는 단순한 출시 공지가 아니라 production coding model이 어떻게 학습되고 평가되는지 비교적 구체적으로 공개한 1차 자료가 됐다.

보고서에 따르면 Composer 2는 agentic software engineering에 특화된 domain-specialized model이다. Cursor는 실제 배포 모델이 사용하는 것과 같은 harness와 tool environment 안에서 훈련을 진행해, benchmark용 좁은 프롬프트와 실제 사용자 세션 사이의 train-test mismatch를 줄이려 했다고 설명한다.

학습 방식과 구조

Cursor는 Composer 2를 두 단계로 학습했다고 밝힌다. 첫 단계는 code-dominated data mix 위의 continued pretraining으로, 모델의 coding knowledge와 latent ability를 높이는 목적이다. 보고서에 따르면 기반 모델은 Kimi K2.5이며, 1.04T parameter / 32B active parameter Mixture-of-Experts 구조를 사용했다. 이후 32k token 길이의 학습, 256k token까지의 long-context extension, targeted coding task에 대한 짧은 SFT 단계를 거쳤다.

두 번째 단계는 실제 Cursor 세션과 유사한 환경에서 수행한 large-scale reinforcement learning이다. Cursor는 debugging, new feature, refactor, documentation, testing, code review, DevOps, migration 등 다양한 task category를 반영했다고 설명한다. 또한 긴 작업을 위해 self-summarization을 사용하고, serving 속도를 높이기 위해 multi-token prediction을 도입했으며, reward 설계로 속도와 tool use, code quality 간 균형을 맞추려 했다고 적었다.

공개된 benchmark 수치

평가 결과로 Cursor는 CursorBench 61.3%, SWE-bench Multilingual 73.7%, Terminal-Bench 61.7%를 제시했다. 회사는 이를 frontier-level coding performance이면서도 state-of-the-art model API pricing보다 낮은 serving cost를 가진 결과라고 해석한다. 더 흥미로운 대목은 단순 점수표가 아니라 평가 철학이다. Cursor는 CursorBench가 실제 내부 엔지니어링 세션에서 뽑은 과제를 사용하며, 공개 benchmark보다 더 큰 코드 변경과 더 짧고 덜 명시적인 프롬프트를 담는다고 주장한다.

이 보고서가 중요한 이유는 coding agent 경쟁이 autocomplete를 넘어 long-horizon autonomous workflow로 이동하는 시점에, 학습 환경과 reward 설계, benchmark 구성을 얼마나 투명하게 설명하느냐가 전략적 차별점이 되고 있기 때문이다. 주요 자료: Composer 2 Technical Report.

Cursor, Composer 2 technical report 공개하며 coding agent 학습 경로와 benchmark 수치 설명

개요

학습 방식과 구조

공개된 benchmark 수치

Related Articles

Cursor, real-time RL로 Composer checkpoint를 5시간마다 개선 배포 가능하다고 설명

RTX 5090부터 AMD AI395까지, LocalLLaMA 벤치마크가 보여준 현실적인 선택지

Hacker News가 주목한 ATLAS, local coding agent 비용 모델에 던지는 질문

Comments (0)

Leave a Comment

Related Articles

Cursor, real-time RL로 Composer checkpoint를 5시간마다 개선 배포 가능하다고 설명

RTX 5090부터 AMD AI395까지, LocalLLaMA 벤치마크가 보여준 현실적인 선택지

Hacker News가 주목한 ATLAS, local coding agent 비용 모델에 던지는 질문