Anthropic moves Claude agent safety from prompts to sandboxes

Containment becomes the control plane for Claude agents

As AI agents gain access to files, terminals, internal tools, and remote workspaces, the practical safety problem changes. Anthropic’s May 26 post said agent “access and permissions” need to evolve with capability, then pointed readers to an engineering write-up on how Claude is contained across products. The source tweet is available on X.

The linked post frames the issue as blast radius. A less capable assistant can be restricted mostly by prompts and confirmations; a more capable agent needs its operating environment constrained. Anthropic describes different containment patterns for claude.ai, Claude Code, and Claude Cowork, including sandboxes, virtual machines, egress controls, and scoped permissions. The company says that a level of access it would have rejected 12 months earlier is now routine for internal developer productivity, which makes isolation a product requirement rather than a research footnote.

The most concrete number is the weakness of approval dialogs. Anthropic says telemetry showed Claude Code users approved roughly 93% of permission prompts. That means a human-in-the-loop system can degrade into a near-automatic click-through flow, especially when users see many prompts in a session. Claude Code auto mode is presented as one response: automate safer approvals, reduce fatigue, and reserve human attention for decisions that actually need it.

The post fits Anthropic’s usual focus on safety and interpretability, but its significance is operational. It mentions Claude Mythos Preview as a model whose blast radius was too high to ship in April 2026, while arguing that similar capability may become deployable as defenders harden systems. Watch whether sandbox policy, network controls, and per-tool permission scopes become baseline requirements for enterprise agents rather than optional security extras.

Anthropic moves Claude agent safety from prompts to sandboxes

Containment becomes the control plane for Claude agents

Related Articles

Codex sensitive-file exclusion debate exposes the limits of ignore files

Anthropic’s J-space work exposes hidden model goals inside Claude’s active state

Anthropic formalizes disclosure rules for Claude-discovered vulnerabilities