Anthropic is using Opus 4.7's vision gains to push Claude into prototypes, slides, and one-pagers. Claude Design is rolling out as a research preview for Pro, Max, Team, and Enterprise subscribers, with design-system ingestion, Canva/PPTX/PDF export, and Claude Code handoff.
#agents
RSS FeedWhy it matters: long-running agents need memory that survives beyond one prompt without replaying every message. Cloudflare says Agent Memory is in private beta and keeps useful state available without filling the context window.
HN focused on the plumbing question: does a 14-plus-provider inference layer actually make agent apps easier to operate? Cloudflare framed AI Gateway, Workers AI bindings, and a broader multimodal catalog as one platform, while commenters compared it with OpenRouter and pressed on pricing accuracy, catalog overlap, and deployment trust.
HWE-Bench moves LLM agent evaluation from isolated HDL tasks to repository-scale hardware repairs. The best agent solved 70.7% overall, but performance fell below 65% on complex SoC-level projects.
A new arXiv paper puts a hierarchical agent system at the top of MLE-Bench with a 63.1% medal rate. The result matters because the agent handles design, coding, debugging, training, and tuning from a task description plus data.
HN cared less about the headline speedup than the plumbing: can Android give Claude Code, Codex, Gemini CLI, and other agents a clean terminal surface instead of forcing them through IDE guesswork?
IBM Research’s VAKRA moves agent evaluation from static Q&A into executable tool environments. With 8,000+ locally hosted APIs across 62 domains and 3-7 step reasoning chains, the benchmark finds a gap between surface tool use and reliable enterprise agents.
Cloudflare is turning AutoRAG into AI Search, a retrieval primitive agents can create and query from Workers. The open beta adds BM25 plus vector search, built-in storage and index, metadata boosting, and cross-instance search with concrete free and paid limits.
HN treated Cloudflare Email Service less as agent magic and more as a new email sender entering a hostile protocol world. The thread focused on Workers integration, SES alternatives, spam pressure, MTA-STS, and sending limits.
HN read Codex less as a feature list and more as a permission problem. The thread kept circling desktop agents, non-developer workflows, sensitive files, and whether users really want an AI operating their computer.
Cloudflare is trying to make model choice less sticky: AI Gateway now routes Workers AI calls to 70+ models across 12+ providers through one interface. For agent builders, the important part is not the catalog alone but spend controls, retry behavior, and failover in workflows that may chain ten inference calls for one task.
Vercel is making durable execution a first-party primitive for apps and agents. Workflows is now GA after more than 100M beta runs across 1,500+ customers, removing separate queues, workers and retry infrastructure.