#agents

AI Mar 27, 2026 1 min read

OpenAI, Safety Bug Bounty 공개... AI abuse·agentic risk 신고 범위 확대

OpenAI는 March 25, 2026에 AI abuse와 safety risk를 겨냥한 공개 Safety Bug Bounty를 시작했다. 기존 Security Bug Bounty가 다루기 어려웠던 agentic misuse, prompt injection, data exfiltration 같은 AI 특화 이슈를 별도 채널로 받겠다는 의미다.

#openai #ai-safety #bug-bounty

LLM sources.twitter Mar 26, 2026 1 min read

Cloudflare, sandboxed AI code execution용 Dynamic Workers를 open beta로 확대

Cloudflare는 2026년 3월 24일 Dynamic Workers가 AI가 생성한 코드를 보안이 적용된 경량 isolate 안에서 실행할 수 있게 하며, 이 접근이 기존 container보다 100배 빠르다고 밝혔다. Cloudflare 블로그는 이 기능이 유료 Workers 사용자를 대상으로 open beta에 들어갔고, <code>globalOutbound: null</code>로 직접적인 외부 인터넷 접근을 차단할 수 있다고 설명한다.

#cloudflare #agents #sandboxing

LLM Reddit Mar 26, 2026 2 min read

r/LocalLLaMA가 주목한 NVIDIA의 open-weight 전략, $26B 투자 보도보다 더 중요한 Nemotron 신호

r/LocalLLaMA에서는 NVIDIA가 향후 5년간 open-weight AI model에 $26 billion을 투입할 수 있다는 보도가 빠르게 확산됐지만, 핵심 논의는 숫자보다 전략에 있었다. March 2026에 공개된 Nemotron 3 Super는 NVIDIA가 open model, tooling, Blackwell 최적화 deployment를 하나의 묶음으로 밀고 있음을 보여주는 가장 분명한 증거다.

#nvidia #open-weights #nemotron

AI Hacker News Mar 26, 2026 2 min read

Hacker News가 주목한 ARC-AGI-3, 상호작용과 적응을 중심에 둔 새 agent benchmark

ARC Prize는 ARC-AGI-3를 static puzzle 정답률이 아니라 새로운 환경 안에서의 planning, memory compression, belief updating을 측정하는 interactive reasoning benchmark로 설명한다. Hacker News에서는 이 점이 실제 agent behavior를 더 잘 드러낸다는 이유로 큰 관심을 모았다.

#arc-agi #benchmark #agents

LLM Mar 26, 2026 1 min read

OpenAI·Amazon, Bedrock·Frontier·Trainium·투자를 한 묶음으로 결합

Amazon과 OpenAI는 2026년 2월 27일 Amazon Bedrock 기반 Stateful Runtime Environment, AWS를 통한 Frontier 유통, 장기 Trainium capacity를 포함한 multi-year strategic partnership를 발표했다. Amazon은 OpenAI에 $50 billion을 투자하겠다고도 밝혔다.

#openai #amazon #bedrock

LLM Mar 26, 2026 1 min read

Anthropic, Vercept 인수로 Claude computer use 강화

Anthropic는 2026년 2월 25일 Vercept를 인수해 Claude의 computer use capability를 강화한다고 밝혔다. 회사는 이번 거래를 Sonnet 4.6의 OSWorld 72.5% 성과와 live application 안에서 행동하는 agent 전략의 연장선으로 설명했다.

#anthropic #claude #computer-use

AI Hacker News Mar 26, 2026 1 min read

ARC-AGI-3, interactive reasoning benchmark의 기준을 다시 세우다

ARC Prize가 2026년 3월 24일 공개한 ARC-AGI-3는 static task보다 interactive reasoning을 전면에 둔 새 benchmark다. HN에서는 238 points와 163 comments를 기록하며 agent 평가 방식의 전환점으로 주목받았다.

#arc-agi #agents #benchmark

LLM Mar 25, 2026 1 min read

Microsoft Research, AI agent 첫 치명적 실패 지점 찾는 AgentRx 오픈소스 공개

Microsoft Research가 긴 agent trajectory에서 첫 critical failure step을 찾는 AgentRx를 공개했다. 115개 failed trajectory benchmark와 nine-category taxonomy도 함께 내놓으며 failure localization과 root-cause attribution 개선 수치를 제시했다.

#agents #debugging #opensource

LLM Mar 25, 2026 1 min read

Anthropic, Claude Sonnet 4.6 공개... 1M token context와 agent workflow 강화

Anthropic는 Feb 17, 2026, Claude Sonnet 4.6를 공개하며 coding, computer use, long-context reasoning, agent planning 전반을 강화했다고 밝혔다. 가격은 Sonnet 4.5와 같은 $3/$15를 유지하면서 1M token context window와 다수 tool 기능을 추가했다.

#anthropic #claude #llm

LLM Reddit Mar 24, 2026 1 min read

r/singularity, Anthropic Dispatch를 phone-first AI coworker의 다음 단계로 보다

r/singularity는 Anthropic의 Dispatch + computer use 출시를 phone-first AI coworker로 가는 실제 product shift로 읽었다. 동시에 macOS-only rollout과 screen-driven automation의 한계도 함께 짚었다.

#claude #computer-use #mobile

LLM Mar 24, 2026 1 min read

NVIDIA, OpenShell 공개… autonomous agent를 위한 runtime-level security 분리 제안

NVIDIA가 2026년 3월 23일 OpenShell을 공개했다. 회사는 autonomous agent마다 sandbox를 분리하고 security policy를 infrastructure layer에 두는 방식으로, agentic workflow를 더 안전하게 운영할 수 있다고 설명했다.

#nvidia #agents #security

AI Hacker News Mar 24, 2026 1 min read

Mozilla.ai의 cq, 코딩 에이전트를 위한 Stack Overflow식 메모리 레이어로 HN 주목

Mozilla.ai의 `cq`는 코딩 에이전트가 정적인 repo instruction만 의존하지 않고, task-specific한 lesson을 질의·검증·공유하자는 로컬 우선 knowledge commons 제안으로 HN의 관심을 모았다.

#agents #developer-tools #mozilla-ai