Claude Fable 5 has moved to the top of Artificial Analysis’s GDPval-AA benchmark with a 1932 score. The result puts Anthropic models in three of the top four slots and raises the bar for long-running agentic knowledge work.
Claude Fable 5 has moved to the top of Artificial Analysis’s GDPval-AA benchmark with a 1932 score. The result puts Anthropic models in three of the top four slots and raises the bar for long-running agentic knowledge work.
HN latched onto a practical shift in coding evals: correctness is no longer enough if the patch would fail human review.
Anthropic is not only shipping a stronger Claude model; it is splitting the same base capability into a broad Fable release and a restricted Mythos track. The package includes $10/$50 token pricing, 30-day safety retention, and automatic fallback to Opus 4.8 for some high-risk requests.
Google Research is turning enterprise RAG into an iterative agent workflow, not a one-shot retrieval step. Its sufficient-context check lifted factuality accuracy by up to 34% and reached 90.1% accuracy in a cross-corpus FramesQA setup.
Google released Gemma 4 QAT checkpoints for edge devices and consumer GPUs. The mobile format cuts Gemma 4 E2B to a 1GB memory footprint while adding Q4_0 and ecosystem-ready weights.
The draw for LocalLLaMA was not just another coding model, but Cohere asking the local-inference crowd to test pre-release weights first.
HN interest centered less on “Claude finds bugs” and more on the shape of a harness security teams can adapt for their own targets.
OpenAI made ChatGPT Lockdown Mode available to all logged-in users and added moderation scores to API generation requests on June 4. The changes move prompt-injection and data-exfiltration defenses from policy language into product controls.
Open-model competition is shifting from leaderboard scores to agent operating costs. NVIDIA says Nemotron 3 Ultra is a 550B MoE model with 5x faster inference and up to 30% lower cost for complex agentic tasks.
The thread’s energy centered on the architecture claim: what does “encoder-free” really mean for a 12B multimodal model?
Local multimodal AI is moving into the 12B class. Google Gemma introduced Gemma 4 12B under Apache 2.0, describing a unified encoder-free design for image, audio, and text inputs.
GitHub expanded the Copilot app technical preview to paid Copilot customers and put local and cloud sandboxes into public preview. The notable shift is not another chat feature: it is execution control for coding agents that can run commands, modify files, and open pull requests.