#tool-use

LLM Apr 17, 2026 1 min read

IBM VAKRA, tool agent가 무너지는 지점을 실행 환경으로 측정한다

IBM Research의 VAKRA는 agent benchmark를 static Q&A에서 실행 가능한 tool environment로 옮겼다. 62 domains, 8,000+ locally hosted APIs, 3-7 step reasoning chains가 들어가며, 결과는 agent reliability가 아직 tool demo 수준을 넘기 어렵다는 쪽에 가깝다.

#agents #benchmarks #ibm

LLM Reddit Apr 12, 2026 1 min read

r/LocalLLaMA가 본 MiniMax M2.7, chat model보다 agent system에 가깝다

r/LocalLLaMA에서 MiniMax M2.7가 빠르게 올라온 이유는 Hugging Face 공개가 단순 chat model이 아니라 tool use, Agent Teams, deployment guide까지 묶은 agent system처럼 포지셔닝됐기 때문이다. 초기 관심은 benchmark 숫자만큼이나 운영 가능한 packaging에도 쏠려 있다.

#llm #agents #tool-use

LLM Hacker News Apr 6, 2026 2 min read

Hacker News, coding agent를 구성하는 여섯 가지 핵심 블록 정리

Sebastian Raschka가 2026년 4월 4일 공개한 글은 coding agent의 성능 차이가 단순히 base model보다 harness 설계에서 나온다고 주장한다. 그는 live repo context, prompt/cache reuse, structured tools, context reduction, session memory, bounded subagents를 여섯 가지 핵심 구성요소로 정리했고, Hacker News에서는 이를 Codex·Claude Code류 도구를 이해하는 실무적 기준으로 받아들였다.

#coding-agents #agent-harness #repo-context

LLM Reddit Apr 1, 2026 1 min read

Reddit가 주목한 Liquid AI의 LFM2.5-350M, 350M 파라미터로 agentic edge를 노린다

LocalLLaMA에서 화제가 된 LFM2.5-350M은 작은 범용 모델이 아니라 tool use와 structured output에 맞춘 350M edge model이라는 점에서 주목받았다. Liquid AI는 10T에서 28T token으로 pretraining을 늘리고 large-scale RL을 더했다고 설명한다.

#liquid-ai #small-models #agentic

LLM Reddit Feb 28, 2026 1 min read

Reddit 이슈: 보이지 않는 Unicode 문자가 AI 에이전트 지시를 바꿀 수 있다는 “Reverse CAPTCHA” 분석

r/artificial에서 주목받은 보안 연구는 zero-width/Unicode Tags를 이용한 숨은 지시가 도구 사용형 LLM 에이전트에 미치는 영향을 분석했다. 공개 요약은 5개 모델, 8,308개 출력 평가를 제시한다.

#ai-security #prompt-injection #unicode