#red-teaming

AI 3d ago 2 min read

OpenAI puts GPT-5.5 bio jailbreaks on bounty with a $25,000 prize

OpenAI is attaching cash to the hardest kind of safety failure: a single prompt that breaks all five of its bio safeguards. The new GPT-5.5 Bio Bug Bounty pays $25,000 for a universal jailbreak, limits testing to GPT-5.5 in Codex Desktop, and starts formal testing on April 28.

#openai #gpt-5.5 #biosecurity

AI Hacker News Apr 13, 2026 2 min read

Hacker News spotlights Berkeley's warning that top AI agent benchmarks are vulnerable to score hacking

A 520-point Hacker News thread amplified Berkeley's claim that eight major AI agent benchmarks can be pushed toward near-perfect scores through harness exploits instead of genuine task completion.

#ai-agents #benchmarks #evaluation

LLM Hacker News Apr 8, 2026 2 min read

Hacker News Tracks Claude Mythos Preview's Cybersecurity Leap

Anthropic's April 7, 2026 security write-up for Claude Mythos Preview argues that frontier LLM gains are now translating into real exploit-development capability. Hacker News is treating the post as a sign that defensive tooling and offensive risk are accelerating together.

#anthropic #cybersecurity #llm

LLM Mar 28, 2026 2 min read

OpenAI moves to acquire Promptfoo to bring agent security testing into Frontier

OpenAI announced plans to acquire Promptfoo on March 9, 2026. The company says Promptfoo’s security testing and evaluation technology will be integrated into OpenAI Frontier so enterprises can test and document risks such as prompt injection, jailbreaks, data leaks, and tool misuse earlier in the development cycle.

#openai #promptfoo #ai-security

AI Mar 13, 2026 1 min read

OpenAI moves to acquire Promptfoo and fold agent security testing into Frontier

OpenAI says it plans to acquire Promptfoo and bring its evaluation, red-teaming, and traceability tooling into OpenAI Frontier. Promptfoo's open-source project will stay under its current license, and the deal remains subject to customary closing conditions.

#openai #promptfoo #agent-security

AI Hacker News Mar 7, 2026 2 min read

HN Focus: Anthropic and Mozilla Put AI-Assisted Firefox Security on Measurable Ground

The Anthropic-Mozilla collaboration that spread on Hacker News disclosed that Claude Opus 4.6 found 22 Firefox vulnerabilities, 14 of them high-severity. The durable lesson is not autonomous magic but faster defender workflows built around validation, triage, and reproducible evidence.

#cybersecurity #firefox #anthropic