LLM

LLM Reddit 4h ago 1 min read

Xiaomi’s 1T MiMo speed claim puts DFlash and GPU codesign under LocalLLaMA scrutiny

The LocalLLaMA angle is not just the 1000+ tps headline, but whether FP4, DFlash, and commodity GPU kernels can be reproduced outside Xiaomi’s hosted trial.

#xiaomi #mimo #inference

LLM Hacker News 4h ago 1 min read

Large context windows are not the same as reliable agent memory

The HN interest came from a practical complaint: advertised context size does not map cleanly to the part of the window an LLM can use well.

#context-window #coding-agents #llm-evaluation

LLM 1d ago 1 min read

AgentPerf reframes AI infra: GB300 serves 20x more coding agents per MW

NVIDIA says its GB300 NVL72 delivered up to 20x more concurrent agentic coding capacity per megawatt than H200 on Artificial Analysis’ new AA-AgentPerf benchmark. The test measures concurrent AI agents under service-level objectives, not just raw token throughput.

#nvidia #agentperf #benchmark

LLM X/Twitter 1d ago 1 min read

MiniMax M3 weights hit Hugging Face with 428B total parameters

MiniMax has moved M3 from model teaser to open-weight distribution. The Hugging Face card lists about 428B total parameters, 23B activated parameters, and a 1M-token context window.

#minimax #open-weights #multimodal

LLM X/Twitter 1d ago 1 min read

US export order cuts off Fable 5 and Mythos 5 access for Anthropic users

Model access changed through export control, not a normal product decision. Anthropic said the directive forced it to disable Fable 5 and Mythos 5 for all customers while leaving other Claude models online.

#anthropic #claude #export-control

LLM Reddit 2d ago 1 min read

Papers with Code now has to track “papers without code”

The r/MachineLearning thread captured a practical benchmark problem: closed models dominate eval tables even when their results are not reproducible in the old Papers with Code sense.

#benchmarks #open-source #leaderboards

LLM 2d ago 2 min read

DiffusionGemma cuts the token bottleneck with a 26B open model

Google DeepMind released DiffusionGemma, a 26B MoE open model that uses text diffusion instead of token-by-token decoding. The pitch is up to 4x faster generation on dedicated GPUs for local, interactive workflows.

#google #deepmind #gemma

LLM X/Twitter 2d ago 2 min read

OpenAI’s Ona deal gives Codex secure cloud runtimes for longer agent work

The acquisition points Codex toward enterprise agents that keep working after a laptop closes. OpenAI says Codex now has more than 5 million weekly users, up 400% from earlier this year, while Ona brings cloud environments used by 2 million developers.

#openai #codex #ona

LLM X/Twitter 3d ago 1 min read

Claude Fable 5 reaches 1932 on GDPval-AA and takes agent benchmark lead

Claude Fable 5 has moved to the top of Artificial Analysis’s GDPval-AA benchmark with a 1932 score. The result puts Anthropic models in three of the top four slots and raises the bar for long-running agentic knowledge work.

#anthropic #claude #benchmark

LLM Hacker News 4d ago 1 min read

FrontierCode Asks Whether an AI Patch Would Actually Get Merged

HN latched onto a practical shift in coding evals: correctness is no longer enough if the patch would fail human review.

#coding-agents #benchmark #evals

LLM 4d ago 2 min read

Claude Fable 5 puts Mythos-class AI behind cautious fallbacks

Anthropic is not only shipping a stronger Claude model; it is splitting the same base capability into a broad Fable release and a restricted Mythos track. The package includes $10/$50 token pricing, 30-day safety retention, and automatic fallback to Opus 4.8 for some high-risk requests.

#anthropic #claude #safety

LLM Jun 7, 2026 2 min read

Google’s Agentic RAG keeps searching until enterprise answers hold up

Google Research is turning enterprise RAG into an iterative agent workflow, not a one-shot retrieval step. Its sufficient-context check lifted factuality accuracy by up to 34% and reached 90.1% accuracy in a cross-corpus FramesQA setup.

#google #rag #agents