OpenAI puts GPT-5.5 live with 82.7% Terminal-Bench gains

OpenAI used its April 23 source post to frame GPT-5.5 as “a new class of intelligence for real work,” which is a sharper claim than a routine model bump. The company is pitching it as a system that can hold a messy objective, use tools, check its own work, and keep moving across a long task instead of waiting for step-by-step prompting. The rollout started in ChatGPT and Codex, with a separate GPT-5.5 Pro tier for harder questions.

The supporting product page makes the bet measurable. OpenAI says GPT-5.5 reaches 82.7% on Terminal-Bench 2.0, up from 75.1% for GPT-5.4, and 73.1% on its internal Expert-SWE eval for long-horizon coding work. On the same page, OpenAI says the model matches GPT-5.4 per-token latency in real-world serving while using fewer tokens in Codex. That combination matters more than a headline IQ bump: the whole point is to let an agent stay on task longer without burning through retries or stalling on infrastructure overhead.

The account context matters too. The main OpenAI account is usually where flagship releases land, while @OpenAIDevs handled the developer follow-up thread and the April 24 API update. That second layer makes this feel less like a teaser and more like a coordinated platform push: ChatGPT for interactive work, Codex for longer-running engineering tasks, and API access for companies that want the same model inside their own systems. OpenAI also says GPT-5.5 is available in Codex with a 400K context window, which reinforces that this release is aimed at sustained computer work rather than short-answer chat.

The next signal to watch is whether outside harnesses confirm the same gap over GPT-5.4 once teams run it on their own repos, browser workflows, and internal research tasks. The other open question is price-performance at scale. If GPT-5.5 really keeps its latency while reducing token burn, it could shift which tasks people hand off to agents by default. If that efficiency breaks down under heavy enterprise use, the story becomes much smaller than the launch framing suggests.

OpenAI puts GPT-5.5 live with 82.7% Terminal-Bench gains

Related Articles

GPT-5.5 pushes agentic coding higher without adding latency

Codex now controls apps, browsers, and images for 3M weekly devs

GPT-5.5 jumps 3 points clear on Artificial Analysis, but cost rises 20%

Comments (0)

Leave a Comment

Related Articles

GPT-5.5 pushes agentic coding higher without adding latency

Codex now controls apps, browsers, and images for 3M weekly devs

GPT-5.5 jumps 3 points clear on Artificial Analysis, but cost rises 20%