OpenAI puts GPT-5.5 bio jailbreaks on bounty with a $25,000 prize
Original: GPT‑5.5 Bio Bug Bounty View original →
Safety announcements often talk in abstractions. OpenAI's new GPT-5.5 Bio Bug Bounty is more concrete: it is paying for proof that a single prompt can break its bio safeguards, not just for vague reports that a model feels risky. The company is offering $25,000 to the first researcher who finds a universal jailbreak that clears all five questions in its bio safety challenge.
The scope is narrow on purpose. OpenAI says the model in scope is GPT-5.5 in Codex Desktop only, and the target is a clean chat that answers all five bio safety questions without prompting moderation. That is a high bar. A one-off weird answer would not meet it; the winning result has to show a reusable prompt that generalizes across the full test set OpenAI has defined.
The program is not a public free-for-all. Applications opened on April 23, 2026 and close on June 22, 2026, with formal testing scheduled from April 28 through July 27. OpenAI says it will invite a vetted list of trusted bio red-teamers, review new applications, and onboard accepted participants onto a dedicated platform. All prompts, completions, findings, and communications are covered by NDA.
That structure says a lot about how frontier-model safety work is changing. OpenAI is still controlling access, scope, and disclosure, but it is also moving beyond internal evaluation by asking outside researchers to attack the model under explicit rules and cash incentives. In practical terms, the company is turning one of the hardest safety questions in AI, whether safeguards fail under persistent adversarial pressure, into a paid test with a clear pass-fail condition.
The page also points readers to OpenAI's broader safety and security bug bounty programs, which suggests this is part of a larger external-testing pipeline rather than a one-off stunt around GPT-5.5. What matters next is simple: whether anyone claims the $25,000 prize, what classes of jailbreak attempts prove most effective, and how quickly those lessons feed back into model defenses.
Related Articles
OpenAI is widening access to GPT-5.4-Cyber through verified cyber-defense channels, with $10 million in API credits and government evaluation access attached. The real story is the access model: stronger cyber capability is being paired with identity checks, tiered trust, and accountability rather than a simple public release.
OpenAI’s April 21 system card puts concrete safety numbers behind ChatGPT Images 2.0, including 6.7% policy-violating generations before final blocking in thinking mode. The card matters because higher realism, web-grounded image reasoning, biorisk prompts, and provenance are now treated as one deployment problem.
HN focused less on the demo reel and more on whether the model can obey dense prompts. ChatGPT Images 2.0 arrived with broader style, multilingual text, and layout examples, but the thread quickly moved into prompt adherence, pricing, and synthetic media fatigue.
Comments (0)
No comments yet. Be the first to comment!