The arXiv paper Ares, submitted on March 9, 2026, proposes dynamic per-step reasoning selection for multi-step LLM agents. The authors report up to 52.7% lower reasoning token usage versus fixed high-effort settings with only minimal drops in task success.
#llm-agents
RSS FeedHacker News highlighted SWE-CI, an arXiv benchmark that evaluates whether LLM agents can sustain repository quality across CI-driven iterations, not just land a single passing patch.
A popular Hacker News post highlighted Agent Safehouse, a macOS tool that wraps Claude Code, Codex and similar agents in a deny-first sandbox using sandbox-exec. The project grants project-scoped access by default, blocks sensitive paths at the kernel layer, and ships as a single Bash script under Apache 2.0.
A Reddit post in r/artificial drew attention to a security study evaluating how hidden Unicode instructions can steer tool-enabled LLM agents, reporting 8,308 graded outputs across five frontier models.
Andrej Karpathy coined a new term for OpenClaw-like AI agent systems: "Claws." Just as LLM agents were a new layer on top of LLMs, Claws provide orchestration, scheduling, persistent context, and tool calls on top of LLM agents.
A Hacker News post highlighted the SkillsBench paper, which evaluates agent skills across 86 tasks and 11 domains. Curated skills improved average pass rate substantially, while self-generated skills showed no average gain.