OpenAI launches Safety Bug Bounty for AI abuse, agentic, and platform risks

On March 25, 2026, OpenAI launched a public Safety Bug Bounty program aimed at identifying AI abuse and safety risks across its products. The company said the new program is designed to complement, not replace, its existing Security Bug Bounty. The distinction matters because some failures in modern AI systems create meaningful abuse or tangible harm even when they do not fit the traditional definition of a software security vulnerability.

The clearest focus is on agentic risk. OpenAI explicitly listed third-party prompt injection and data exfiltration cases in which attacker-controlled text can reliably hijack a victim's agent, including Browser, ChatGPT Agent, and similar agentic products, to take harmful actions or leak sensitive information. For some reports, the harmful behavior must be reproducible at least 50% of the time. The company also said it will consider reports where an agentic OpenAI product performs a disallowed action on OpenAI's website at scale, or where another harmful action can be tied to plausible and material harm.

The program also covers proprietary information exposure and account or platform integrity issues. That includes model generations that reveal proprietary reasoning-related information, as well as bypasses of anti-automation controls, manipulation of account trust signals, and evasion of account restrictions, suspensions, or bans. OpenAI drew a sharp boundary around what is not covered: general jailbreaks are out of scope unless they demonstrate concrete abuse or safety impact. Ordinary authorization issues still belong in the Security Bug Bounty, and low-signal policy bypasses without demonstrable harm are excluded.

This is a notable operational change for the wider AI industry. As AI products gain browsing, tool use, and multi-step action capabilities, the failure modes are no longer limited to model outputs. They now include action control, prompt-injection-driven misuse, and leakage of sensitive context across connected systems. OpenAI is effectively creating a formal intake channel for those gray-area issues, with reports routed between its safety and security teams depending on scope.

For developers and enterprises, the message is that AI threat models are widening. Teams building agent workflows, MCP-connected tools, or autonomous product features are being pushed to treat prompt injection, unintended actions, and context leakage as first-class operational risks. If similar programs spread across the industry, bug bounties may become a more standard governance layer for production AI systems rather than a security-only mechanism.

OpenAI launches Safety Bug Bounty for AI abuse, agentic, and platform risks

Related Articles

Introducing the OpenAI Safety Bug Bounty program

OpenAI Says Enterprise AI Has Moved Beyond Experimentation

OpenAI puts GPT-5.4-Cyber in the hands of vetted defenders

Comments (0)

Leave a Comment

Related Articles

Introducing the OpenAI Safety Bug Bounty program
AI Apr 12, 2026 2 min read

OpenAI Says Enterprise AI Has Moved Beyond Experimentation
AI Apr 13, 2026 2 min read

OpenAI puts GPT-5.4-Cyber in the hands of vetted defenders
AI sources.official Apr 17, 2026 2 min read