#llm-agents

LLM Mar 14, 2026 2 min read

Ares Paper Shows Dynamic Reasoning Can Cut LLM Agent Tokens by Up to 52.7%

The arXiv paper Ares, submitted on March 9, 2026, proposes dynamic per-step reasoning selection for multi-step LLM agents. The authors report up to 52.7% lower reasoning token usage versus fixed high-effort settings with only minimal drops in task success.

#llm-agents #reasoning #efficiency

LLM Hacker News Mar 10, 2026 2 min read

SWE-CI Pushes Coding-Agent Evaluation From One-Shot Fixes to Long-Horizon Maintenance

Hacker News highlighted SWE-CI, an arXiv benchmark that evaluates whether LLM agents can sustain repository quality across CI-driven iterations, not just land a single passing patch.

#llm-agents #software-engineering #benchmarks

LLM Hacker News Mar 9, 2026 2 min read

Agent Safehouse brings deny-first macOS sandboxing to local coding agents

A popular Hacker News post highlighted Agent Safehouse, a macOS tool that wraps Claude Code, Codex and similar agents in a deny-first sandbox using sandbox-exec. The project grants project-scoped access by default, blocks sensitive paths at the kernel layer, and ships as a single Bash script under Apache 2.0.

#llm-agents #macos #sandboxing

LLM Reddit Feb 28, 2026 2 min read

Reddit Highlights “Reverse CAPTCHA” Study on Invisible Unicode Prompt Injection in AI Agents

A Reddit post in r/artificial drew attention to a security study evaluating how hidden Unicode instructions can steer tool-enabled LLM agents, reporting 8,308 graded outputs across five frontier models.

#ai-security #prompt-injection #unicode

LLM Hacker News Feb 22, 2026 1 min read

Karpathy: "Claws" Are a New Layer on Top of LLM Agents

Andrej Karpathy coined a new term for OpenClaw-like AI agent systems: "Claws." Just as LLM agents were a new layer on top of LLMs, Claws provide orchestration, scheduling, persistent context, and tool calls on top of LLM agents.

#llm-agents #karpathy #openclaw

LLM Hacker News Feb 17, 2026 1 min read

SkillsBench Finds Self-Generated Agent Skills Add No Average Benefit

A Hacker News post highlighted the SkillsBench paper, which evaluates agent skills across 86 tasks and 11 domains. Curated skills improved average pass rate substantially, while self-generated skills showed no average gain.

#llm-agents #benchmark #evaluation