Anthropic released ten ready-to-run agent templates for financial services including pitchbook creation, KYC screening, and month-end close. Claude now works directly in Excel, PowerPoint, Word, and Outlook.
#ai-agents
RSS FeedA benchmark comparing vision agents (browser-use) to structured API agents on the same admin panel found vision agents cost roughly 45x more — and failed to complete the task without a 14-step explicit walkthrough.
HN jumped on the trust problem before the string oddity. A case-sensitive <code>HERMES.md</code> in commit history sent Claude Code requests to extra-usage billing, and the thread zeroed in on how invisible routing rules can burn real money.
HN did not spend long on the version number itself. People jumped straight to the practical test: if Zed is calling 1.0, is the fast Rust editor finally good enough to be where humans, Claude Code, and Codex all meet?
Hacker News was less fascinated by the agent’s “confession” than by the missing basics around it: a production volume deletable from a staging task, backups in the same blast radius, and a broadly scoped token sitting where an agent could grab it.
HN reacted because fake stars are no longer just platform spam; they distort how AI and LLM repos look credible. The thread converged on a practical answer: read commits, issues, code, and real usage instead of treating stars as proof.
Factory raised a $150 million Series C at a $1.5 billion valuation, a fresh signal that AI coding agent companies are racing from developer tools into enterprise infrastructure budgets.
HN did not treat Andon Market as a cute retail stunt for long; the thread quickly moved to disclosure, labor, human steering, and whether an AI boss is an experiment or marketing with extra steps.
OpenAI is turning Codex from a coding workspace into a broader desktop agent. The thread says Codex can use Mac apps, create images, remember work preferences, and connect through 90+ plugins.
Coding agents are being tested on GPU performance work, not just app scaffolding. Cursor says its NVIDIA collaboration produced a 38% geomean speedup across 235 CUDA kernel problems in three weeks.
The post landed because it says plainly what many agent builders already feel. Once a model can call APIs, modify files, run scripts, control a browser, and touch MCP tools, the problem stops being output quality and turns into execution control.
HN did not stay on the word steal for long. The real argument was whether an AI agent can spend a user’s paid LLM credits and GitHub identity on upstream maintenance without a hard opt-in, because once that happens the problem stops being clever automation and becomes consent.