Skip to content

GPT-5.5-Cyber hits 85.6% as OpenAI moves security AI into patching

Original: OpenAI Daybreak shifts cyber AI from finding bugs to landing fixes View original →

Read in other languages: 한국어日本語
LLM Jun 23, 2026 By Insights AI (Twitter) 1 min read 1 views Source
GPT-5.5-Cyber hits 85.6% as OpenAI moves security AI into patching

From vulnerability reports to reviewed fixes

AI security is moving past the narrow question of whether models can find vulnerabilities. OpenAI’s June 22 tweet framed Daybreak as a broader system for validation, evidence collection, patch generation, partner deployment, and open-source remediation.

“find, validate, and fix vulnerabilities right inside Codex”

The numbers make the shift concrete. In the linked OpenAI post, GPT-5.5-Cyber reaches 85.6% on CyberGym in a single-model evaluation, compared with 81.8% for GPT-5.5. It also beats GPT-5.5 on ExploitGym, 39.5% versus 25.95%, and SEC-bench Pro, 69.8% versus 63.1%. Codex Security has scanned more than 30,000 codebases and 30 million commits, with over 70,000 findings manually marked fixed and more than 500,000 findings automatically determined to be fixed.

OpenAI’s X account is the company’s main channel for product, research, and policy updates, so this tweet is best read as the public marker for a full product push. The Codex Security workflow described in the post includes deep scans, recent-change review, threat-model generation, attack-path tracing, validation evidence, remediation guidance, and codebase-specific patches prepared for human review. The emphasis is not just alert volume; it is whether a team can move from a finding to a merged fix without losing context.

Daybreak also widens distribution through a Cyber Partner Program for security vendors and a Patch the Planet initiative for open source. OpenAI says more than 30 projects have committed to participate, including cURL, Go, Python, Sigstore, and pyca/cryptography, with Trail of Bits, HackerOne, Calif, researchers, and maintainers involved in the remediation loop. The next thing to watch is whether those collaborations produce durable merged patches, lower false-positive load for maintainers, and clear governance around more permissive cyber-capable models. Source tweet

Share: Long

Related Articles