#tool-use

LLM Apr 17, 2026 1 min read

IBM VAKRA、tool agentが壊れる箇所を実行環境で測る

IBM ResearchのVAKRAはagent評価をstatic Q&Aからexecutable tool environmentへ移した。62 domains、8,000+ locally hosted APIs、3-7 step reasoning chainsを含み、surface-level tool useとenterprise agent reliabilityの差を示している。

#agents #benchmarks #ibm

LLM Reddit Apr 12, 2026 1 min read

r/LocalLLaMAが見たMiniMax M2.7、chat modelよりagent systemに近い

r/LocalLLaMAでMiniMax M2.7が一気に伸びた理由は、Hugging Face公開が単なるchat modelではなく、tool use、Agent Teams、deployment guideまで含むagent systemとして提示されたからだ。初期の関心はbenchmarkの数字だけでなく、実運用を意識したpackagingにも向いている。

#llm #agents #tool-use

LLM Hacker News Apr 6, 2026 1 min read

Hacker News、coding agentを支える6つの構成要素整理に注目

Sebastian Raschkaが2026年4月4日に公開した記事は、coding agentの実力差はbase modelだけでなくharness設計から生まれると整理する。記事はlive repo context、prompt/cache reuse、structured tools、context reduction、session memory、bounded subagentsの6要素を提示し、Hacker NewsではCodexやClaude Codeのような製品を理解するための実務的な枠組みとして受け止められた。

#coding-agents #agent-harness #repo-context

LLM Reddit Apr 1, 2026 1 min read

Redditで話題のLiquid AI LFM2.5-350M、350Mパラメータでagentic edgeを狙う

LocalLLaMAで注目されたLFM2.5-350Mは、小さな汎用modelではなく、tool useとstructured outputに特化した350M edge modelとして受け止められた。Liquid AIはpretrainingを10Tから28T tokenへ拡張し、large-scale RLを追加したと説明している。

#liquid-ai #small-models #agentic

LLM Reddit Feb 28, 2026 1 min read

Redditで議論: 見えないUnicode指示がAIエージェントを誘導する「Reverse CAPTCHA」評価

r/artificialで共有されたセキュリティ研究は、zero-width文字とUnicode Tagsによる不可視指示がツール利用型LLMエージェントへ与える影響を検証した。公開概要は5モデル・8,308出力の評価を示している。

#ai-security #prompt-injection #unicode