OpenAI puts GPT-5.5 live with 82.7% Terminal-Bench gains
Original: Introducing GPT-5.5. A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex. View original →
OpenAI used its April 23 source post to frame GPT-5.5 as “a new class of intelligence for real work,” which is a sharper claim than a routine model bump. The company is pitching it as a system that can hold a messy objective, use tools, check its own work, and keep moving across a long task instead of waiting for step-by-step prompting. The rollout started in ChatGPT and Codex, with a separate GPT-5.5 Pro tier for harder questions.
The supporting product page makes the bet measurable. OpenAI says GPT-5.5 reaches 82.7% on Terminal-Bench 2.0, up from 75.1% for GPT-5.4, and 73.1% on its internal Expert-SWE eval for long-horizon coding work. On the same page, OpenAI says the model matches GPT-5.4 per-token latency in real-world serving while using fewer tokens in Codex. That combination matters more than a headline IQ bump: the whole point is to let an agent stay on task longer without burning through retries or stalling on infrastructure overhead.
The account context matters too. The main OpenAI account is usually where flagship releases land, while @OpenAIDevs handled the developer follow-up thread and the April 24 API update. That second layer makes this feel less like a teaser and more like a coordinated platform push: ChatGPT for interactive work, Codex for longer-running engineering tasks, and API access for companies that want the same model inside their own systems. OpenAI also says GPT-5.5 is available in Codex with a 400K context window, which reinforces that this release is aimed at sustained computer work rather than short-answer chat.
The next signal to watch is whether outside harnesses confirm the same gap over GPT-5.4 once teams run it on their own repos, browser workflows, and internal research tasks. The other open question is price-performance at scale. If GPT-5.5 really keeps its latency while reducing token burn, it could shift which tasks people hand off to agents by default. If that efficiency breaks down under heavy enterprise use, the story becomes much smaller than the launch framing suggests.
Related Articles
OpenAI is pitching GPT-5.5 as more than a routine model refresh. With 82.7% on Terminal-Bench 2.0, 58.6% on SWE-Bench Pro, and a claim that it keeps GPT-5.4-level latency, the company is resetting expectations for long-running coding agents.
OpenAI says more than 3 million developers use Codex each week, and the desktop app is now moving beyond code edits. The update adds background computer use on macOS, an in-app browser, gpt-image-1.5 image generation, 90+ new plugins, PR review workflows, SSH devboxes in alpha, automations, and memory preview.
Why it matters: this is one of the first external benchmark reads to land right after the GPT-5.5 launch. Artificial Analysis said GPT-5.5 moved 3 points clear on its Intelligence Index, while the full index run still became roughly 20% more expensive.
Comments (0)
No comments yet. Be the first to comment!