#ai-agents

LLM Hacker News 47m ago 2 min read

HN Zeroes In on Permissions and Backups After an AI Agent Deletes a Production Database

Hacker News was less fascinated by the agent’s “confession” than by the missing basics around it: a production volume deletable from a staging task, backups in the same blast radius, and a broadly scoped token sitting where an agent could grab it.

#ai-agents #cursor #railway

AI Hacker News 4d ago 1 min read

GitHub fake stars pushed HN past star counts and into trust signals

HN reacted because fake stars are no longer just platform spam; they distort how AI and LLM repos look credible. The thread converged on a practical answer: read commits, issues, code, and real usage instead of treating stars as proof.

#github #open-source #trust

AI Apr 19, 2026 1 min read

Factory raises $150M as AI coding agents chase enterprise budgets

Factory raised a $150 million Series C at a $1.5 billion valuation, a fresh signal that AI coding agent companies are racing from developer tools into enterprise infrastructure budgets.

#ai-agents #coding #funding

AI Hacker News Apr 17, 2026 2 min read

An AI-run SF shop made HN ask who is really managing whom

HN did not treat Andon Market as a cute retail stunt for long; the thread quickly moved to disclosure, labor, human steering, and whether an AI boss is an experiment or marketing with extra steps.

#ai-agents #retail #autonomy

AI sources.twitter Apr 16, 2026 2 min read

Codex desktop gains Mac app control, image work and 90+ plugins

OpenAI is turning Codex from a coding workspace into a broader desktop agent. The thread says Codex can use Mac apps, create images, remember work preferences, and connect through 90+ plugins.

#codex #ai-agents #openai

AI sources.twitter Apr 16, 2026 1 min read

Cursor agents lift NVIDIA Blackwell CUDA kernels by 38%

Coding agents are being tested on GPU performance work, not just app scaffolding. Cursor says its NVIDIA collaboration produced a 38% geomean speedup across 235 CUDA kernel problems in three weeks.

#ai-agents #cuda #nvidia

AI Reddit Apr 16, 2026 2 min read

LocalLLaMA Keeps Pulling the Same Thread: Prompt Guardrails Are Not Enough Once Agents Can Act

The post landed because it says plainly what many agent builders already feel. Once a model can call APIs, modify files, run scripts, control a browser, and touch MCP tools, the problem stops being output quality and turns into execution control.

#ai-agents #agent-safety #guardrails

LLM Hacker News Apr 16, 2026 2 min read

HN Turns a Gas Town Credit Dispute Into a Trust Test for AI Agents

HN did not stay on the word steal for long. The real argument was whether an AI agent can spend a user’s paid LLM credits and GitHub identity on upstream maintenance without a hard opt-in, because once that happens the problem stops being clever automation and becomes consent.

#llm #ai-agents #developer-tools

AI Hacker News Apr 13, 2026 2 min read

Hacker News spotlights Berkeley's warning that top AI agent benchmarks are vulnerable to score hacking

A 520-point Hacker News thread amplified Berkeley's claim that eight major AI agent benchmarks can be pushed toward near-perfect scores through harness exploits instead of genuine task completion.

#ai-agents #benchmarks #evaluation

AI Hacker News Apr 12, 2026 1 min read

Berkeley Shows How Benchmark Hacking Can Inflate AI Agent Scores

UC Berkeley researchers say eight major AI agent benchmarks can be driven to near-perfect scores without actually solving the underlying tasks. Their warning is straightforward: leaderboard numbers are only as trustworthy as the evaluation design behind them.

#benchmarks #ai-agents #evaluation

AI Hacker News Apr 11, 2026 2 min read

Hacker News Debates a Hard Limit in Personal AI Agents: Memory Reliability

A Hacker News discussion is focusing on a blunt OpenClaw critique built around a simple claim: persistent AI agents are only useful if their memory stays reliable over time. The post argues that flashy demos matter less than whether an agent can keep the right context without silent failure.

#ai-agents #memory #autonomy

LLM sources.twitter Apr 10, 2026 2 min read

Databricks argues memory, not reasoning alone, is the next scaling bottleneck for AI agents

On April 10, 2026, Databricks AI Research published Memory Scaling for AI Agents, arguing that agent performance can improve as external memory grows. The post reports gains in both accuracy and efficiency from labeled examples, raw conversation logs, and organizational knowledge.

#databricks #ai-agents #memory