#ai-agents

AI Hacker News Apr 12, 2026 1 min read

Berkeley는 왜 AI agent benchmark 숫자를 믿기 어렵다고 말하나

UC Berkeley 연구진은 주요 AI agent benchmark 8종을 감사한 결과, 실제 문제를 풀지 않고도 거의 만점에 가까운 점수를 만들 수 있었다고 밝혔다. 글의 핵심은 leaderboard 수치보다 evaluation 설계와 공격 저항성을 먼저 보라는 것이다.

#benchmarks #ai-agents #evaluation

AI Hacker News Apr 11, 2026 2 min read

Hacker News가 짚은 personal AI agent의 한계, memory reliability

Hacker News에서 화제가 된 OpenClaw 비판 글은 약 1,000건의 deployment 관찰을 바탕으로, persistent agent의 핵심 문제는 flashy demo가 아니라 memory reliability라고 주장한다.

#ai-agents #memory #autonomy

LLM X/Twitter Apr 10, 2026 2 min read

Databricks, AI agent의 다음 bottleneck은 reasoning보다 memory라고 주장

Databricks AI Research는 2026년 4월 10일 Memory Scaling for AI Agents를 공개하며, real-world agent 성능은 더 긴 reasoning보다 external memory 축적과 retrieval 품질에 더 크게 좌우될 수 있다고 주장했다. 글은 labeled 예제, user log, organizational knowledge로 정확도와 효율이 함께 개선되는 결과를 제시한다.

#databricks #ai-agents #memory

AI Reddit Apr 8, 2026 1 min read

r/artificial, Claude Code 유출에서 드러난 프로덕션 AI agent 설계 패턴을 짚다

r/artificial의 최근 글은 Claude Code leak를 단순 해프닝이 아니라 AI agent 설계 교본처럼 읽어야 한다는 관점을 제시했다. 핵심은 model weights가 아니라 memory, permissions, tool orchestration, multi-agent coordination 같은 실제 product layer가 드러났다는 점이다.

#anthropic #claude-code #ai-agents

AI Reddit Apr 6, 2026 1 min read

r/artificial이 정리한 agent-native stack, email부터 wallet까지 API primitive로 쪼개지다

r/artificial의 한 토론 글은 email, phone number, browser, computer, memory, payments, SaaS access 같은 사람의 기본 업무 능력이 빠르게 agent용 API primitive로 재구성되고 있다고 정리한다.

#ai-agents #infrastructure #automation

AI Hacker News Mar 30, 2026 2 min read

Hacker News, coding agent 시대에 다시 커지는 software freedom 논의

2026년 3월 Hacker News에서 George London의 글이 252 points와 261 comments를 모으며, coding agent가 free software를 다시 실질적 문제로 만든다는 주장에 관심이 쏠렸다. 핵심은 source code 접근이 더 이상 프로그래머만의 상징적 권리가 아니라, agent가 사용자를 대신해 소프트웨어를 바꾸는 실무 능력이 된다는 점이다.

#ai-agents #open-source #saas

AI Hacker News Mar 29, 2026 2 min read

Stanford의 jai, Linux에서 AI agent를 감싸는 경량 안전 레이어로 Hacker News 주목

2026년 3월 Hacker News에서 Stanford SCS의 `jai`가 604 points와 313 comments를 기록했다. 이 도구는 현재 작업 디렉터리는 그대로 쓰게 두고, 나머지 home 영역은 overlay 또는 숨김 처리해 AI agent의 파일 손상 범위를 줄이려는 Linux용 containment 도구다.

#ai-agents #sandboxing #linux

AI Mar 27, 2026 1 min read

NIST, 상호운용성과 보안을 위한 AI Agent Standards Initiative 출범

NIST는 2026년 2월 17일 Center for AI Standards and Innovation이 AI Agent Standards Initiative를 시작한다고 밝혔다. 이 프로그램은 autonomous AI system의 확산을 위해 기술 표준, open protocol, agent security와 identity 연구를 함께 다룬다.

#nist #standards #ai-agents

Sciences Reddit Mar 24, 2026 1 min read

r/singularity가 주목한 Anthropic의 “AI grad student” physics 실험과 솔직한 failure mode

Subreddit이 집중한 것은 Anthropic physics case study의 드문 솔직함이었다. Claude는 작업 속도를 끌어올렸지만, fabricated check, 잘못된 formula, 약한 judgment를 잡아내기 위해서는 여전히 expert supervision이 필요했다.

#anthropic #physics #scientific-research

AI Reddit Mar 21, 2026 1 min read

r/LocalLLaMA, 30개+ AI agent framework를 코드 수준으로 정리한 핸드북 공유

2026년 3월 20일 r/LocalLLaMA에서 AI Agent Engineering Handbook가 소개되며, 30개 이상의 오픈소스 agent framework를 실제 구현 관점에서 비교한 자료로 주목받았다.

#ai-agents #frameworks #open-source

AI Hacker News Mar 19, 2026 1 min read

Show HN: zeroboot, AI agent용 서브밀리초 KVM 샌드박스를 내세우다

2026년 3월 17일 Show HN의 zeroboot 글은 크롤링 시점 303 points와 69 comments를 기록했다. 프로젝트는 copy-on-write 스냅샷 포킹으로 실제 KVM microVM 격리를 제공하며, p50 0.79 ms 스폰과 샌드박스당 약 265 KB 메모리를 주장한다.

#ai-agents #sandboxing #kvm

LLM Hacker News Mar 17, 2026 2 min read

Show HN에서 뜬 Godogen, Claude Code skills로 Godot 4 게임을 끝까지 생성

2026년 3월 16일 Hacker News에서 Godogen Show HN 글은 247 points와 153 comments를 모았다. 이 프로젝트는 text prompt에서 Godot 4 project, asset generation, visual QA까지 이어지는 agent pipeline을 공개해 관심을 끌었다.

#godot #claude-code #game-dev