Articles

All AI LLM Humanoid Robots Sciences Gaming Finance

Source:

From To

AI X/Twitter 3h ago 2 min read

Perplexity turns agent search into Python orchestration with SaC

Perplexity is replacing serial search calls with generated Python that composes retrieval primitives inside agent harnesses. In one CVE advisory case study, it says token use fell 85.1%, from 288.7K to 42.9K.

#perplexity #agents #search

LLM Reddit 21h ago 2 min read

Qwen3.6-27B Looks Viable for Local Agent Planning, Not Ungated Execution

The useful number in the Reddit report was not the hardware spec; it was a reported 12% tool-call formatting error rate.

#qwen #local-ai #agents

AI 23h ago 1 min read

NVIDIA Vera shifts the AI-agent bottleneck from GPUs to CPUs

NVIDIA says Vera is now in full production and can complete agentic workloads 1.8x faster than x86 CPUs. OpenAI, Anthropic, SpaceXAI, ByteDance, CoreWeave, and OCI are among the names tied to adoption or evaluation.

#nvidia #vera #ai-infrastructure

LLM 1d ago 2 min read

Nemotron 3 Ultra turns agent cost and runtime into NVIDIA’s pitch

NVIDIA is packaging a 550B-parameter MoE model with agent tooling instead of treating the model as a standalone release. The pitch is concrete: up to 5x faster inference, up to 30% lower cost, and availability beginning June 4.

#nvidia #nemotron #agents

LLM 2d ago 1 min read

Claude Managed Agents gain AWS webhooks, orchestration, and self-hosted sandboxes

Anthropic’s May 29 platform notes move Claude Managed Agents deeper into AWS operations. Webhooks, multiagent orchestration, and self-hosted sandboxes are now available on Claude Platform on AWS, with new IAM actions and a managed policy for self-hosted execution.

#anthropic #claude #agents

LLM 4d ago 2 min read

Gemini 3.5 Flash reaches GA as Google turns Search into an agent surface

Google’s I/O 2026 AI story is about distribution as much as models. Gemini 3.5 Flash is now generally available across API, Antigravity, Android Studio, enterprise tools, Search, and the Gemini app, while Gemini Omni Flash brings video generation into the same push.

#google #gemini #agents

LLM X/Twitter 5d ago 1 min read

Opus 4.8 beats GPT-5.5 by 121 points on GDPval-AA agent benchmark

Claude Opus 4.8 is showing its strongest early signal in agentic work, not only coding. Artificial Analysis says the model scored 1890 on GDPval-AA, 121 points ahead of GPT-5.5 xhigh.

#anthropic #claude #benchmark

LLM 5d ago 2 min read

Mistral Vibe folds work agents and coding PRs into one subscription

Mistral is turning Le Chat into Vibe, a combined work and coding agent. The launch adds Work Mode, remote Code Mode, a VS Code extension, CLI updates, and paid plans starting at $14.99 per month.

#mistral #vibe #agents

LLM 6d ago 2 min read

Benchmark audit finds 25.7% flawed tasks and shifts agent rankings

The weak point in model leaderboards may be the tasks, not only the models. A new arXiv paper reports critical issues in more than 25.7% of evaluated benchmark tasks and shows ranking shifts after filtering flawed items.

#benchmarks #swe-bench #agents

AI X/Twitter May 27, 2026 1 min read

Grok Build beta opens to X Premium+ and SuperGrok users

xAI is pushing Grok from chat into app and automation building. The beta combines Plan Mode, Imagine media generation, and a CLI for automations, and the launch post drew more than 53 million views.

#xai #grok #agents

AI X/Twitter May 27, 2026 1 min read

Anthropic moves Claude agent safety from prompts to sandboxes

Claude products now touch real tools, so the risk question is shifting from model persuasion to execution boundaries. Anthropic says users approved about 93% of Claude Code permission prompts, a number that weakens human-in-the-loop defenses.

#anthropic #claude #agents

LLM Hacker News May 24, 2026 1 min read

Kanbots turns every Kanban card into a local coding-agent workspace

The discussion centered less on parallel agents as a novelty and more on reviewability, worktree setup, and the value of local-first storage.

#agents #kanban #local-first