GPT-5.5 pushes agentic coding higher without adding latency
Original: Introducing GPT-5.5 View original →
OpenAI is not framing GPT-5.5 as a routine benchmark refresh. The point of this release is that the model is supposed to carry more of a messy task on its own: write and debug code, search the web, analyze data, create documents and spreadsheets, move across tools, and keep going without constant hand-holding. That matters because the current crop of coding agents has looked strongest in demos and eval charts, but has still hit friction once real projects turn ambiguous.
The headline numbers are aimed straight at that gap. On the launch page, OpenAI says GPT-5.5 reaches 82.7% on Terminal-Bench 2.0, 58.6% on SWE-Bench Pro, 84.9% on GDPval and 78.7% on OSWorld-Verified. OpenAI also says the model matches GPT-5.4's per-token latency in real-world serving while using fewer tokens on Codex tasks. If those trade-offs hold in production, the story here is not just "smarter model" but "more useful work per second and per token."
OpenAI is also tying the release tightly to products people already use. On April 23, the company said GPT-5.5 was rolling out to Plus, Pro, Business and Enterprise users in ChatGPT and Codex, while GPT-5.5 Pro was rolling out to Pro, Business and Enterprise tiers. The same page was updated on April 24 to say both GPT-5.5 and GPT-5.5 Pro are now available in the API as well, which quickly turns the launch from a ChatGPT feature into a platform event for developer tooling.
The coding angle is where the pressure lands first. OpenAI says GPT-5.5 is its strongest agentic coding model so far, and the company is pairing that claim with stories from early testers who used it on debugging, refactors and large merges. Those anecdotes are marketing, not proof, but the benchmark mix matters: terminal workflows, GitHub issue resolution and real computer use are all closer to how engineers actually delegate work than older one-shot coding tests.
There is still a real caveat in the fine print. OpenAI says GPT-5.5 ships with its strongest safeguards yet, expanded testing for cyber and biology capabilities, and feedback from nearly 200 early-access partners. That is the price of shipping a model that is explicitly better at long-running action. The release to watch is not the first-day chart; it is how reliably GPT-5.5 behaves once it starts touching enterprise repos, production spreadsheets and browser-based workflows at scale.
Related Articles
HN treated GPT-5.5 less like another model launch and more like a test of whether AI can actually carry messy computer tasks to completion. The discussion kept drifting from benchmarks to rollout timing, API access, and whether the gains show up in real coding work.
OpenAI is pushing harder into agentic work, not just chat. On the company's own evals, GPT-5.5 reaches 82.7% on Terminal-Bench 2.0, beats GPT-5.4 by 7.6 points, and uses fewer tokens in Codex.
OpenAI says more than 3 million developers use Codex each week, and the desktop app is now moving beyond code edits. The update adds background computer use on macOS, an in-app browser, gpt-image-1.5 image generation, 90+ new plugins, PR review workflows, SSH devboxes in alpha, automations, and memory preview.
Comments (0)
No comments yet. Be the first to comment!