Daniel Vaughan’s Gemma 4 writeup tests whether a local model can function as a real Codex CLI agent, with the answer depending less on benchmark claims than on very specific serving choices. The key lesson is that Apple Silicon required llama.cpp plus `--jinja`, KV-cache quantization, and `web_search = "disabled"`, while a GB10 box worked through Ollama 0.20.5.
LLM
RSS FeedA large Hacker News thread turned a Claude Code quota complaint into a deeper argument about how prompt caching, background sessions, and auto-compacts behave inside 1M-context agent workflows. The GitHub issue author published April 9, 2026 usage logs, and the discussion quickly shifted from “limits feel worse” to cache accounting and quota transparency.
GitHub put the Copilot SDK into public preview on April 2, 2026, exposing the same runtime behind Copilot cloud agent and Copilot CLI. The SDK ships across five languages with tool use, streaming, permissions, OpenTelemetry, and BYOK support.
On April 8, 2026, Gemini introduced notebooks as a new project layer for grouping past chats, files and instructions. Google says notebooks sync with NotebookLM and are rolling out first on the web for Google AI Ultra, Pro and Plus subscribers.
On April 9, 2026, Gemini said it can now generate interactive visualizations directly in chat. Google’s product page says the rollout adds functional simulations, adjustable parameters and 3D exploration to the Gemini app for global users on the Pro model.
A fresh r/LocalLLaMA post published DFlash benchmarking on M5 Max with MLX 0.31.1 and reported 127.07 tok/s and a 4.13x speedup on Qwen3.5-9B. The most useful part is not the headline number but the post’s clear reproduction setup and bandwidth-bound interpretation.
Google says coding agents often produce stale Gemini API code because model training data has a cutoff date, and is shipping Docs MCP plus Developer Skills as the fix. Used together, Google reports a 96.3% pass rate with 63% fewer tokens per correct answer than vanilla prompting on its eval set.
Google is adding Flex and Priority service tiers to the Gemini API so developers can choose lower-cost synchronous inference for background work or higher-assurance routing for critical traffic. The change gives agent builders a cleaner way to separate cost and reliability without splitting architectures across multiple APIs.
Amazon Bedrock AgentCore Evaluations packages judge-model scoring, ground-truth testing, CloudWatch observability, and custom evaluators into a managed workflow for agent QA. The announcement matters because it frames agent quality as an ongoing production discipline rather than a prompt-tuning exercise.
AWS has moved Security Agent and DevOps Agent into general availability, turning its re:Invent frontier-agent concept into commercial products for security testing and multicloud incident operations. The key signal is that AWS is now selling long-running autonomous agents as operational tooling, not just demo workflows.
A 54-point Reddit post flagged merged PR #19441 as the moment qwen3-omni-moe and qwen3-asr support reached llama.cpp, with commenters focused on local multimodal and ASR use cases.
GitHub now lets users assign Dependabot alerts to AI coding agents including Copilot, Claude, and Codex. The agents can analyze the advisory, open a draft pull request, and attempt to fix test failures, but GitHub says humans still need to review the output before merging.