Anthropic、実運用での AI agent autonomy 測定結果を公開

Anthropicは2026年2月18日、実運用で AI agent にどの程度の autonomy が与えられているかを調べた研究を公開した。研究では privacy-preserving tool を使い、Claude Code と Anthropic public API 全体にわたる数百万件の human-agent interaction を分析している。問いはきわめて実務的だ。人は agent にどの程度の裁量を与えるのか、経験を積むとそれはどう変わるのか、agent はどんな domain で使われ、そこでの行動はどの程度 risky なのか。Anthropicはこの研究を通じて、safe な agent rollout には post-deployment monitoring が不可欠になりつつあると示している。

最も目立つ結果は、Claude Code における長時間の autonomous run が急速に伸びていることだ。Anthropicによれば、99.9th percentile の turn duration は 2025年10月の under 25 minutes から 2026年1月には over 45 minutes へとほぼ倍増した。同時に full auto-approve の利用率も経験とともに上昇しており、新規ユーザーではおよそ20%だが、750 sessions 前後のユーザーでは 40% を超える。重要なのは、それが oversight の消失を意味しないことだ。human interrupt rate も経験とともに上がっており、ユーザーが各 action を逐一承認する方式から、agent を監視しつつ必要な場面でだけ介入する方式へ移っていることを示している。

Anthropicは、agent 自身が止まることも重要な oversight mechanism だと論じている。最も複雑な task では、Claude Code が clarification を求めて停止する頻度が、人間による interrupt の2倍以上になった。public API 側では software engineering が agentic activity の nearly 50% を占めたが、healthcare、finance、cybersecurity といった higher-risk domain でも初期利用が確認されたという。ただし Anthropic は、観測された public API action の大半は low-risk かつ reversible であり、高リスク deployment はまだ広範囲に広がっていないと整理している。

より大きな含意は、model の capability と実運用で与えられる autonomy のあいだに依然としてギャップがあることだ。Anthropicはこれを deployment overhang に近いものとみている。つまり model は現実のユーザーが現在許している以上の independence を扱える可能性があるということだ。その一方で、auto-approve の上昇、interrupt の上昇、agent-initiated stop の増加が同時に起きていることは、agent 時代の oversight が「逐一承認」から「必要時の介入」と「model 自身の自己停止能力」へ移りつつあることを示している。Anthropicは結論として、より安全な agent deployment には、強い post-deployment monitoring infrastructure と、人間と AI が共同で oversight を担う新しい interaction pattern が必要だと主張している。

Anthropic、実運用での AI agent autonomy 測定結果を公開

Related Articles

r/singularityで話題、Anthropicが1Mコンテキストを一般提供し長文プレミアムを廃止

OpenAI、AI agentをprompt injectionに強くする設計原則を公開

HN: Anthropic、Opus 4.6とSonnet 4.6で1M contextを標準価格に拡大

Comments (0)

Leave a Comment

Related Articles

r/singularityで話題、Anthropicが1Mコンテキストを一般提供し長文プレミアムを廃止
2026年3月13日のr/singularityでは、AnthropicがOpus 4.6とSonnet 4.6向けに1M contextを一般提供し、long-context premiumなしで標準料金を適用するとした発表が注目を集めた。

OpenAI、AI agentをprompt injectionに強くする設計原則を公開

HN: Anthropic、Opus 4.6とSonnet 4.6で1M contextを標準価格に拡大
AnthropicはOpus 4.6とSonnet 4.6で1M contextをgeneral availabilityにし、long-context premiumなしで標準価格を適用すると発表した。Hacker Newsでは単なるspec更新より、実運用コストが変わる点に関心が集まった。