Claude Fable 5 has moved to the top of Artificial Analysis’s GDPval-AA benchmark with a 1932 score. The result puts Anthropic models in three of the top four slots and raises the bar for long-running agentic knowledge work.
Claude Fable 5 has moved to the top of Artificial Analysis’s GDPval-AA benchmark with a 1932 score. The result puts Anthropic models in three of the top four slots and raises the bar for long-running agentic knowledge work.
Anthropic points to infrastructure, not only model intelligence, as the bottleneck for scientific agents. In an NCBI Virus retrieval task, accuracy rose to nearly 100% after adding a deterministic gget virus layer.
Google Research is turning enterprise RAG into an iterative agent workflow, not a one-shot retrieval step. Its sufficient-context check lifted factuality accuracy by up to 34% and reached 90.1% accuracy in a cross-corpus FramesQA setup.
HN interest centered less on “Claude finds bugs” and more on the shape of a harness security teams can adapt for their own targets.
Open-model competition is shifting from leaderboard scores to agent operating costs. NVIDIA says Nemotron 3 Ultra is a 550B MoE model with 5x faster inference and up to 30% lower cost for complex agentic tasks.
Microsoft Discovery became generally available on June 2 for organizations building governed R&D workflows. The platform connects specialized agents, scientific knowledge, simulation tools, validation data, and a new local preview app for researchers.
GitHub expanded the Copilot app technical preview to paid Copilot customers and put local and cloud sandboxes into public preview. The notable shift is not another chat feature: it is execution control for coding agents that can run commands, modify files, and open pull requests.
Perplexity is replacing serial search calls with generated Python that composes retrieval primitives inside agent harnesses. In one CVE advisory case study, it says token use fell 85.1%, from 288.7K to 42.9K.
The useful number in the Reddit report was not the hardware spec; it was a reported 12% tool-call formatting error rate.
NVIDIA says Vera is now in full production and can complete agentic workloads 1.8x faster than x86 CPUs. OpenAI, Anthropic, SpaceXAI, ByteDance, CoreWeave, and OCI are among the names tied to adoption or evaluation.
NVIDIA is packaging a 550B-parameter MoE model with agent tooling instead of treating the model as a standalone release. The pitch is concrete: up to 5x faster inference, up to 30% lower cost, and availability beginning June 4.
Anthropic’s May 29 platform notes move Claude Managed Agents deeper into AWS operations. Webhooks, multiagent orchestration, and self-hosted sandboxes are now available on Claude Platform on AWS, with new IAM actions and a managed policy for self-hosted execution.