Introducing the OpenAI Safety Bug Bounty program
Original: Introducing the OpenAI Safety Bug Bounty program View original →
OpenAI said on March 25, 2026 that it is launching a public Safety Bug Bounty program on Bugcrowd to collect reports about AI abuse and safety risks across its products. The company frames the program as a complement to its existing Security Bug Bounty, with a focus on harmful behavior that may not fit the classic definition of a software security flaw but could still lead to tangible harm.
What OpenAI wants reported
According to the program overview, the new bounty covers AI-specific scenarios. One major category is agentic risk, including MCP-related testing. OpenAI says valid reports can include prompt injection or data exfiltration when attacker-controlled text can reliably hijack a victim's agent, including Browser, ChatGPT Agent, and similar products, to trigger a harmful action or leak sensitive information. OpenAI says the behavior must be reproducible at least 50% of the time.
The company also lists cases where an OpenAI agentic product performs a disallowed action on OpenAI's own website at scale, or performs another potentially harmful action with plausible and material harm. Additional in-scope areas include exposure of proprietary information related to reasoning, other OpenAI proprietary information, and account or platform integrity failures such as bypassing anti-automation controls, manipulating trust signals, or evading suspensions and bans.
What stays out of scope
OpenAI draws a boundary around general jailbreak reports. It says generic content-policy bypasses without a demonstrable safety or abuse impact are out of scope, and it gives examples of areas that may instead be handled through private campaigns, including some biorisk content issues in ChatGPT Agent and GPT-5. Any MCP-related testing must also comply with the terms of service of third parties involved.
Why this matters
The practical shift is that researchers now have a formal reporting path for safety and abuse failures that sit between policy enforcement and traditional security work. OpenAI says submissions will be triaged by its Safety and Security Bug Bounty teams and may be rerouted between the two programs depending on scope. That structure suggests the company expects growing overlap between model behavior, agent tooling, and platform controls as AI systems take more actions on behalf of users.
Related Articles
On March 25, 2026, OpenAI launched a public Safety Bug Bounty focused on AI abuse and safety risks. The new track complements its security program by accepting AI-specific failures such as prompt injection, data exfiltration, and harmful agent behavior.
OpenAI introduced its Safety Fellowship on X and published program details on April 6, 2026 for external researchers and practitioners working on AI safety and alignment. The move is notable because it extends work on evaluation, robustness, privacy-preserving safety methods, and agentic oversight beyond OpenAI’s internal teams.
OpenAI said on April 10, 2026 that a compromised Axios package touched a GitHub Actions workflow used in its macOS app-signing pipeline. The company says no user data, systems, or software were compromised, but macOS users need updated builds signed with a new certificate before May 8, 2026.
Comments (0)
No comments yet. Be the first to comment!