The LocalLLaMA angle is not just the 1000+ tps headline, but whether FP4, DFlash, and commodity GPU kernels can be reproduced outside Xiaomi’s hosted trial.
LLM
RSS FeedThe HN interest came from a practical complaint: advertised context size does not map cleanly to the part of the window an LLM can use well.
NVIDIA says its GB300 NVL72 delivered up to 20x more concurrent agentic coding capacity per megawatt than H200 on Artificial Analysis’ new AA-AgentPerf benchmark. The test measures concurrent AI agents under service-level objectives, not just raw token throughput.
MiniMax has moved M3 from model teaser to open-weight distribution. The Hugging Face card lists about 428B total parameters, 23B activated parameters, and a 1M-token context window.
Model access changed through export control, not a normal product decision. Anthropic said the directive forced it to disable Fable 5 and Mythos 5 for all customers while leaving other Claude models online.
The r/MachineLearning thread captured a practical benchmark problem: closed models dominate eval tables even when their results are not reproducible in the old Papers with Code sense.
Google DeepMind released DiffusionGemma, a 26B MoE open model that uses text diffusion instead of token-by-token decoding. The pitch is up to 4x faster generation on dedicated GPUs for local, interactive workflows.
The acquisition points Codex toward enterprise agents that keep working after a laptop closes. OpenAI says Codex now has more than 5 million weekly users, up 400% from earlier this year, while Ona brings cloud environments used by 2 million developers.
Claude Fable 5 has moved to the top of Artificial Analysis’s GDPval-AA benchmark with a 1932 score. The result puts Anthropic models in three of the top four slots and raises the bar for long-running agentic knowledge work.
HN latched onto a practical shift in coding evals: correctness is no longer enough if the patch would fail human review.
Anthropic is not only shipping a stronger Claude model; it is splitting the same base capability into a broad Fable release and a restricted Mythos track. The package includes $10/$50 token pricing, 30-day safety retention, and automatic fallback to Opus 4.8 for some high-risk requests.
Google Research is turning enterprise RAG into an iterative agent workflow, not a one-shot retrieval step. Its sufficient-context check lifted factuality accuracy by up to 34% and reached 90.1% accuracy in a cross-corpus FramesQA setup.