The money is following the layer that decides which model gets each request. OpenRouter says weekly traffic rose 5x in six months to 25 trillion tokens, while its platform now spans 400+ models and more than 8 million users.
LLM
RSS FeedLocalLLaMA focused less on OCR novelty and more on the practical package: open weights, self-hosting, and a low VRAM floor.
The thread’s useful tension was not whether AI can write code fast, but whether slower review loops produce code teams can actually trust.
A Daniel Miessler post says Claude Code is preparing a /workflows feature, drawing more than 269K views. The signal is a shift from one-off coding prompts toward repeatable SOP execution inside enterprise AI systems.
xAI’s next Grok foundation model is moving from training into fine-tuning at 1.5T parameters, three times the size of the current 0.5T production model. Musk says Cursor data was added and public release is 2 to 3 weeks away.
The thread split between the convenience of “local LLM in Chrome” and corrections about WebGPU acceleration, model identity, and browser-controlled limits.
The discussion centered less on parallel agents as a novelty and more on reviewability, worktree setup, and the value of local-first storage.
DeepSeek turned a temporary V4-Pro API discount into standard pricing, intensifying the cost race around frontier-class LLM access. The posted table cuts output pricing from $3.48 to $0.87 per million tokens.
At Google I/O 2026 on May 19, Google unveiled Gemini 3.5 Flash—which outperforms Gemini 3.1 Pro across all benchmarks at 4× the speed and half the API cost—alongside Gemini Spark, a 24/7 personal AI agent that works in the background and can be reached directly via Gmail. Spark enters beta for Google AI Ultra subscribers in the US starting the week of May 26.
Anthropic has acquired Stainless, the SDK and MCP platform powering every official Anthropic SDK, in a deal valued at over $300 million. Also used by OpenAI, Google, and Cloudflare, Stainless will shut down its hosted services while its team and technology join Anthropic. The deal marks Anthropic's fourth acquisition in six months, completing key layers of its agent stack strategy.
A viral LocalLLaMA post describes how Qwen3.6 35B A3B transformed complex workflows by combining Codex for task execution with skill documentation, feeding those skills to the pi agent — automating VPS management, PDF conversion, and more.
A community user achieved 110 tokens/second running Qwen3.6 35B A3B on an RTX 4070 Super 12GB via ik_llama.cpp, a fork with superior CPU offload optimization that significantly outperforms upstream llama.cpp's Multi-Token Prediction implementation.