#agent-safety

AI Reddit Apr 16, 2026 1 min read

Prompt guardrailだけでは足りない。LocalLLaMAが戻ってきたのは実行前に止める話だった

この投稿が刺さったのは、agent builder がすでに感じている違和感をそのまま言葉にしたからだ。model が API を呼び、file を変え、script を走らせ、browser や MCP tool に触れるなら、問題は output quality ではなく execution control になる。

#ai-agents #agent-safety #guardrails

LLM X/Twitter Mar 26, 2026 1 min read

Anthropic、Claude Code auto modeを分類器ベースの自律実行の中間解として詳説

Anthropicは2026年3月25日、Claude Code auto modeが多くのpermission promptを分類器に置き換え、すべての承認をスキップするより安全な自律実行経路を提供すると説明した。Engineering記事によれば、この機能はprompt-injection probeと2段階transcript classifierを組み合わせ、エンドツーエンドの実トラフィックで0.4%のfalse-positive rateを報告している。

#anthropic #claude-code #agent-safety

LLM Hacker News Mar 12, 2026 1 min read

Hacker NewsがClaude Code向けcontext-aware permission guardを検証

Show HNに出たnahは、blanketなallow-or-denyではなくtool callの実際の効果を分類するPreToolUse hookを提案した。READMEはpath check、content inspection、optional LLM escalationを強調し、HN discussionはsandboxing、command chain、policy engineが本当にagentic toolを抑えられるのかに集中した。

#llm #agent-safety #claude-code