Hacker News picked up Z.ai's GLM-5.1 as a model aimed less at one-shot wins and more at sustained agentic work. Z.ai reports 58.4 on SWE-Bench Pro, 42.7 on NL2Repo, 66.5 on Terminal Bench 2.0, and long-horizon runs that keep improving through hundreds of iterations and thousands of tool calls.
LLM
RSS FeedGitHub Changelog's March 19, 2026 X post announced that GPT-5.3-Codex is the first long-term support model for Copilot Business and Copilot Enterprise. GitHub says the model launched on February 5, 2026, stays available through February 4, 2027, and becomes the new base model by May 17, 2026.
GitHub Changelog said on April 3, 2026 that GPT-5.1 Codex, GPT-5.1-Codex-Max, and GPT-5.1-Codex-Mini were deprecated across all Copilot surfaces as of April 1. GitHub tells organizations to move workflows and model policies to supported models, with GPT-5.3-Codex named as the replacement.
GitHub Changelog's April 7, 2026 X post said Copilot CLI can now connect to Azure OpenAI, Anthropic, and other OpenAI-compatible endpoints, or run fully local models instead of GitHub-hosted routing. GitHub's changelog adds that offline mode disables telemetry, unauthenticated use is possible with provider credentials alone, and built-in sub-agents inherit the chosen provider.
A LocalLLaMA thread pulled attention to DFlash, a block-diffusion draft model for speculative decoding whose paper claims lossless acceleration above 6x and direct support for vLLM, SGLang, and selected Transformers backends.
A LocalLLaMA post with 117 points spotlights AgentHandover, a Mac menu-bar app that watches repeated workflows, turns them into agent-executable Skills, and keeps the whole pipeline local with MCP hooks for Codex, Claude Code, and other compatible tools.
A LocalLLaMA post with roughly 350 points argues that Gemma 4 26B A3B becomes unusually effective for local coding-agent and tool-calling workflows when paired with the right runtime settings, contrasting it with prompt-caching and function-calling issues the poster saw in other local-model setups.
A Hacker News thread with about 240 points focused attention on Anthropic’s April 6 announcement that it signed for multiple gigawatts of next-generation TPU capacity with Google and Broadcom starting in 2027, alongside claims of more than $30 billion in run-rate revenue and over 1,000 seven-figure business customers.
A recent LocalLLaMA discussion shared results from Mac LLM Bench, an open benchmark workflow for Apple Silicon systems. The most useful takeaway is practical: dense 32B models hit a clear wall on a 32 GB MacBook Air M5, while some MoE models offer a much better latency-to-capability tradeoff.
A high-signal LocalLLaMA post described a port of llama2.c to classic Mac OS that runs Karpathy’s TinyStories 260K model on a stock iMac G3. The project is compelling because most of the work is systems engineering: endianness fixes, memory partition management, and layout debugging on vintage hardware.
A recent Show HN post highlighted GuppyLM, a tiny education-first language model trained on 60K synthetic conversations with a deliberately simple transformer stack. The project stands out because readers can inspect and run the whole pipeline in Colab or directly in the browser.
GitHub’s April 6, 2026 X post said Copilot cloud agent is no longer confined to pull-request workflows. GitHub’s changelog says the agent can now work on a branch before a PR exists, generate implementation plans, and conduct deeper repository research.