#agents

LLM May 29, 2026 2 min read

Gemini 3.5 Flash reaches GA as Google turns Search into an agent surface

Google’s I/O 2026 AI story is about distribution as much as models. Gemini 3.5 Flash is now generally available across API, Antigravity, Android Studio, enterprise tools, Search, and the Gemini app, while Gemini Omni Flash brings video generation into the same push.

#google #gemini #agents

LLM X/Twitter May 29, 2026 1 min read

Opus 4.8 beats GPT-5.5 by 121 points on GDPval-AA agent benchmark

Claude Opus 4.8 is showing its strongest early signal in agentic work, not only coding. Artificial Analysis says the model scored 1890 on GDPval-AA, 121 points ahead of GPT-5.5 xhigh.

#anthropic #claude #benchmark

LLM May 28, 2026 2 min read

Mistral Vibe folds work agents and coding PRs into one subscription

Mistral is turning Le Chat into Vibe, a combined work and coding agent. The launch adds Work Mode, remote Code Mode, a VS Code extension, CLI updates, and paid plans starting at $14.99 per month.

#mistral #vibe #agents

LLM May 27, 2026 2 min read

Benchmark audit finds 25.7% flawed tasks and shifts agent rankings

The weak point in model leaderboards may be the tasks, not only the models. A new arXiv paper reports critical issues in more than 25.7% of evaluated benchmark tasks and shows ranking shifts after filtering flawed items.

#benchmarks #swe-bench #agents

AI X/Twitter May 27, 2026 1 min read

Grok Build beta opens to X Premium+ and SuperGrok users

xAI is pushing Grok from chat into app and automation building. The beta combines Plan Mode, Imagine media generation, and a CLI for automations, and the launch post drew more than 53 million views.

#xai #grok #agents

AI X/Twitter May 27, 2026 1 min read

Anthropic moves Claude agent safety from prompts to sandboxes

Claude products now touch real tools, so the risk question is shifting from model persuasion to execution boundaries. Anthropic says users approved about 93% of Claude Code permission prompts, a number that weakens human-in-the-loop defenses.

#anthropic #claude #agents

LLM Hacker News May 24, 2026 1 min read

Kanbots turns every Kanban card into a local coding-agent workspace

The discussion centered less on parallel agents as a novelty and more on reviewability, worktree setup, and the value of local-first storage.

#agents #kanban #local-first

LLM May 19, 2026 1 min read

Google I/O 2026: Gemini 3.5 Flash and Managed Agents API Launched

Google unveiled Gemini 3.5 Flash at I/O 2026, combining frontier performance with agentic capabilities. The new Managed Agents API provisions a full sandboxed agent environment with a single API call.

#google #gemini #llm

AI May 7, 2026 1 min read

IBM Think 2026: Full-Stack Blueprint for the Multi-Agent Enterprise

IBM CEO Arvind Krishna unveiled a comprehensive multi-agent AI operating model at Think 2026, simultaneously launching watsonx Orchestrate (next-gen), IBM Bob, IBM Concert, and IBM Sovereign Core to bridge the AI divide separating investment from ROI.

#ibm #enterprise #agents

LLM X/Twitter May 3, 2026 1 min read

Karpathy at Sequoia Ascent 2026: Three New Frontiers LLMs Open Beyond Speed

Andrej Karpathy shared highlights from his Sequoia Ascent 2026 fireside chat, arguing that LLMs open genuinely new categories of functionality, not just faster versions of what already existed.

#karpathy #llm #agents

AI May 1, 2026 2 min read

AWS opens Amazon Quick to free signups and brings GPT-5.5, Codex to Bedrock

AWS is packaging its AI story as a full work stack instead of a loose collection of services. On April 28, it opened Amazon Quick to Free and Plus signups with no AWS account required, expanded Amazon Connect into four agentic business solutions, and said GPT-5.5, Codex, and OpenAI-powered Managed Agents are coming to Amazon Bedrock in limited preview.

#aws #bedrock #openai

AI May 1, 2026 2 min read

Stripe gives AI agents a checkout path without handing over card numbers

Stripe is trying to solve the trust problem at the point of payment, not just the shopping flow before it. Its new Link wallet for agents lets users approve spend requests and then returns a one-time-use card or Shared Payment Token, while keeping raw payment credentials away from the agent.

#stripe #agents #payments