OpenAI launches Safety Bug Bounty for AI abuse, agentic, and platform risks
Original: Introducing the OpenAI Safety Bug Bounty program View original →
On March 25, 2026, OpenAI launched a public Safety Bug Bounty program aimed at identifying AI abuse and safety risks across its products. The company said the new program is designed to complement, not replace, its existing Security Bug Bounty. The distinction matters because some failures in modern AI systems create meaningful abuse or tangible harm even when they do not fit the traditional definition of a software security vulnerability.
The clearest focus is on agentic risk. OpenAI explicitly listed third-party prompt injection and data exfiltration cases in which attacker-controlled text can reliably hijack a victim's agent, including Browser, ChatGPT Agent, and similar agentic products, to take harmful actions or leak sensitive information. For some reports, the harmful behavior must be reproducible at least 50% of the time. The company also said it will consider reports where an agentic OpenAI product performs a disallowed action on OpenAI's website at scale, or where another harmful action can be tied to plausible and material harm.
The program also covers proprietary information exposure and account or platform integrity issues. That includes model generations that reveal proprietary reasoning-related information, as well as bypasses of anti-automation controls, manipulation of account trust signals, and evasion of account restrictions, suspensions, or bans. OpenAI drew a sharp boundary around what is not covered: general jailbreaks are out of scope unless they demonstrate concrete abuse or safety impact. Ordinary authorization issues still belong in the Security Bug Bounty, and low-signal policy bypasses without demonstrable harm are excluded.
This is a notable operational change for the wider AI industry. As AI products gain browsing, tool use, and multi-step action capabilities, the failure modes are no longer limited to model outputs. They now include action control, prompt-injection-driven misuse, and leakage of sensitive context across connected systems. OpenAI is effectively creating a formal intake channel for those gray-area issues, with reports routed between its safety and security teams depending on scope.
For developers and enterprises, the message is that AI threat models are widening. Teams building agent workflows, MCP-connected tools, or autonomous product features are being pushed to treat prompt injection, unintended actions, and context leakage as first-class operational risks. If similar programs spread across the industry, bug bounties may become a more standard governance layer for production AI systems rather than a security-only mechanism.
Related Articles
Sam Altman announced OpenAI reached an agreement with the U.S. Department of War to deploy AI models on classified networks, with core safety principles including bans on domestic mass surveillance and autonomous weapon systems.
OpenAI said on February 28, 2026 that it reached an agreement with the U.S. Department of War to deploy advanced AI systems in classified environments. In a follow-up post, the company said the arrangement uses a multi-layer safety approach and cloud-based deployment with cleared personnel in the loop.
OpenAI’s February 2026 safety report says it banned accounts linked to seven operations originating in China. The company says abuse covered cyber activity, covert influence, and scams, while overall malicious use remained low versus legitimate use.
Comments (0)
No comments yet. Be the first to comment!