Perplexity is replacing serial search calls with generated Python that composes retrieval primitives inside agent harnesses. In one CVE advisory case study, it says token use fell 85.1%, from 288.7K to 42.9K.
Perplexity is replacing serial search calls with generated Python that composes retrieval primitives inside agent harnesses. In one CVE advisory case study, it says token use fell 85.1%, from 288.7K to 42.9K.
The useful number in the Reddit report was not the hardware spec; it was a reported 12% tool-call formatting error rate.
NVIDIA says Vera is now in full production and can complete agentic workloads 1.8x faster than x86 CPUs. OpenAI, Anthropic, SpaceXAI, ByteDance, CoreWeave, and OCI are among the names tied to adoption or evaluation.
NVIDIA is packaging a 550B-parameter MoE model with agent tooling instead of treating the model as a standalone release. The pitch is concrete: up to 5x faster inference, up to 30% lower cost, and availability beginning June 4.
Anthropic’s May 29 platform notes move Claude Managed Agents deeper into AWS operations. Webhooks, multiagent orchestration, and self-hosted sandboxes are now available on Claude Platform on AWS, with new IAM actions and a managed policy for self-hosted execution.
Google’s I/O 2026 AI story is about distribution as much as models. Gemini 3.5 Flash is now generally available across API, Antigravity, Android Studio, enterprise tools, Search, and the Gemini app, while Gemini Omni Flash brings video generation into the same push.
Claude Opus 4.8 is showing its strongest early signal in agentic work, not only coding. Artificial Analysis says the model scored 1890 on GDPval-AA, 121 points ahead of GPT-5.5 xhigh.
Mistral is turning Le Chat into Vibe, a combined work and coding agent. The launch adds Work Mode, remote Code Mode, a VS Code extension, CLI updates, and paid plans starting at $14.99 per month.
The weak point in model leaderboards may be the tasks, not only the models. A new arXiv paper reports critical issues in more than 25.7% of evaluated benchmark tasks and shows ranking shifts after filtering flawed items.
xAI is pushing Grok from chat into app and automation building. The beta combines Plan Mode, Imagine media generation, and a CLI for automations, and the launch post drew more than 53 million views.
Claude products now touch real tools, so the risk question is shifting from model persuasion to execution boundaries. Anthropic says users approved about 93% of Claude Code permission prompts, a number that weakens human-in-the-loop defenses.
The discussion centered less on parallel agents as a novelty and more on reviewability, worktree setup, and the value of local-first storage.