A 2,000-person AI assistant attack test raises a harder question about responses

Fernando Irarrázaval put Fiu, an OpenClaw assistant, behind a public email address and invited people to make it leak a secrets.env file. According to the experiment write-up, more than 2,000 people sent over 6,000 emails after the project hit Hacker News. The secret did not leak, and the assistant did not send an unauthorized reply.

That sounds like a clean prompt-injection win, but the HN discussion quickly found the harder edge. Fiu was instructed not to reply to emails, partly because replying to every message would be expensive. Attackers therefore had to make the assistant both reveal the secret and respond. Commenters questioned whether a non-responding assistant is a strong proxy for the kind of agent people worry about in production.

The operational failures were just as useful as the security result. Google suspended the Gmail account after thousands of inbound messages and rapid API use. API costs passed $500. Batch processing also contaminated the experiment: when early messages in a batch were obvious attacks, the model became more suspicious of later messages. The setup was changed so each email ran in a fresh context, and memory files were cleared when the assistant appeared to infer that it was part of a public test.

The result is still meaningful. A strong model with a short, explicit set of rules resisted a large amount of direct social engineering. But it also shows why agent security tests need broader success criteria. A real assistant may reply, edit files, call tools, schedule meetings, or spend money. The community’s pushback was not that the experiment failed; it was that the next test needs to include more of those powers.

A 2,000-person AI assistant attack test raises a harder question about responses

Related Articles

Anthropic’s vuln harness is more workshop jig than boxed scanner

OpenAI details how it is hardening AI agents against prompt injection

Claude Tag turns Slack channels into shared AI-agent workspaces

Related Articles

Anthropic’s vuln harness is more workshop jig than boxed scanner
LLM Hacker News Jun 6, 2026 1 min read

OpenAI details how it is hardening AI agents against prompt injection
LLM Mar 15, 2026 2 min read

Claude Tag turns Slack channels into shared AI-agent workspaces