OpenAI launches Safety Bug Bounty for AI abuse, agentic, and platform risks

Original: Introducing the OpenAI Safety Bug Bounty program View original →

Read in other languages: 한국어日本語
AI Mar 27, 2026 By Insights AI 2 min read 1 views Source

On March 25, 2026, OpenAI launched a public Safety Bug Bounty program aimed at identifying AI abuse and safety risks across its products. The company said the new program is designed to complement, not replace, its existing Security Bug Bounty. The distinction matters because some failures in modern AI systems create meaningful abuse or tangible harm even when they do not fit the traditional definition of a software security vulnerability.

The clearest focus is on agentic risk. OpenAI explicitly listed third-party prompt injection and data exfiltration cases in which attacker-controlled text can reliably hijack a victim's agent, including Browser, ChatGPT Agent, and similar agentic products, to take harmful actions or leak sensitive information. For some reports, the harmful behavior must be reproducible at least 50% of the time. The company also said it will consider reports where an agentic OpenAI product performs a disallowed action on OpenAI's website at scale, or where another harmful action can be tied to plausible and material harm.

The program also covers proprietary information exposure and account or platform integrity issues. That includes model generations that reveal proprietary reasoning-related information, as well as bypasses of anti-automation controls, manipulation of account trust signals, and evasion of account restrictions, suspensions, or bans. OpenAI drew a sharp boundary around what is not covered: general jailbreaks are out of scope unless they demonstrate concrete abuse or safety impact. Ordinary authorization issues still belong in the Security Bug Bounty, and low-signal policy bypasses without demonstrable harm are excluded.

This is a notable operational change for the wider AI industry. As AI products gain browsing, tool use, and multi-step action capabilities, the failure modes are no longer limited to model outputs. They now include action control, prompt-injection-driven misuse, and leakage of sensitive context across connected systems. OpenAI is effectively creating a formal intake channel for those gray-area issues, with reports routed between its safety and security teams depending on scope.

For developers and enterprises, the message is that AI threat models are widening. Teams building agent workflows, MCP-connected tools, or autonomous product features are being pushed to treat prompt injection, unintended actions, and context leakage as first-class operational risks. If similar programs spread across the industry, bug bounties may become a more standard governance layer for production AI systems rather than a security-only mechanism.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.