OpenAI puts GPT-5.5 live with 82.7% Terminal-Bench gains

Original: Introducing GPT-5.5. A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex. View original →

Read in other languages: 한국어日本語
LLM Apr 25, 2026 By Insights AI 2 min read 1 views Source

OpenAI used its April 23 source post to frame GPT-5.5 as “a new class of intelligence for real work,” which is a sharper claim than a routine model bump. The company is pitching it as a system that can hold a messy objective, use tools, check its own work, and keep moving across a long task instead of waiting for step-by-step prompting. The rollout started in ChatGPT and Codex, with a separate GPT-5.5 Pro tier for harder questions.

The supporting product page makes the bet measurable. OpenAI says GPT-5.5 reaches 82.7% on Terminal-Bench 2.0, up from 75.1% for GPT-5.4, and 73.1% on its internal Expert-SWE eval for long-horizon coding work. On the same page, OpenAI says the model matches GPT-5.4 per-token latency in real-world serving while using fewer tokens in Codex. That combination matters more than a headline IQ bump: the whole point is to let an agent stay on task longer without burning through retries or stalling on infrastructure overhead.

The account context matters too. The main OpenAI account is usually where flagship releases land, while @OpenAIDevs handled the developer follow-up thread and the April 24 API update. That second layer makes this feel less like a teaser and more like a coordinated platform push: ChatGPT for interactive work, Codex for longer-running engineering tasks, and API access for companies that want the same model inside their own systems. OpenAI also says GPT-5.5 is available in Codex with a 400K context window, which reinforces that this release is aimed at sustained computer work rather than short-answer chat.

The next signal to watch is whether outside harnesses confirm the same gap over GPT-5.4 once teams run it on their own repos, browser workflows, and internal research tasks. The other open question is price-performance at scale. If GPT-5.5 really keeps its latency while reducing token burn, it could shift which tasks people hand off to agents by default. If that efficiency breaks down under heavy enterprise use, the story becomes much smaller than the launch framing suggests.

Share: Long

Related Articles

LLM 6d ago 2 min read

OpenAI says more than 3 million developers use Codex each week, and the desktop app is now moving beyond code edits. The update adds background computer use on macOS, an in-app browser, gpt-image-1.5 image generation, 90+ new plugins, PR review workflows, SSH devboxes in alpha, automations, and memory preview.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.