GitHub says Copilot cloud agent is no longer limited to pull-request workflows. The April 1 release adds branch-first execution, pre-code implementation plans, and deep repository research sessions.
LLM
RSS FeedA r/LocalLLaMA stress test claims Gemma 4 26B A4B remained coherent at roughly 94% of a 262,144-token context window in llama.cpp. The post is anecdotal, but it is valuable because it pairs the claim with concrete tuning details and failure modes.
A detailed r/LocalLLaMA benchmark reports single- and dual-GPU numbers for Qwen3.5-27B int4 on Intel Arc Pro B70 32GB using Intel’s vLLM fork. The setup is still finicky, but the measurements outline a practical path for local serving on Intel hardware.
Cloudflare moved Workers AI into larger-model territory on March 19, 2026 by adding Moonshot AI’s Kimi K2.5. The company is pitching a single stack for durable agent execution, large-context inference, and lower-cost open-model deployment.
On April 2, 2026 NVIDIA said it has optimized Google’s latest Gemma 4 models for RTX PCs, DGX Spark, and Jetson edge modules. The move is aimed at turning compact multimodal models into practical local agent stacks rather than leaving them mainly in the cloud.
GitHub said on April 7, 2026 that Copilot CLI can now use a developer’s own model provider or fully local models. The change adds Azure OpenAI, Anthropic, offline mode, and optional GitHub auth while keeping the same agentic terminal workflow.
Shopify used an X post to launch the Shopify AI Toolkit as a direct bridge between general-purpose coding agents and the Shopify platform. The docs show a first-party package of documentation access, API schemas, validation, and store execution rather than a loose collection of prompts.
Cursor used an April 3 X post to push developers toward its new Cursor 3 interface. The larger move is shifting from an IDE-side AI panel to a workspace for coordinating many agents across local, cloud, and remote environments.
A LocalLLaMA implementation report says a native MLX DFlash runtime can speed up Qwen inference on Apple Silicon by more than 2x in several settings. The notable part is not only the throughput gain, but the claim that outputs remain bit-for-bit identical to the greedy baseline.
GitHub has moved the Copilot SDK into public preview, exposing the same agent runtime used by Copilot cloud agent and Copilot CLI. Developers can embed tool invocation, streaming, file operations, and multi-turn sessions directly into their own applications.
GitHub now lets repositories assign Dependabot alerts to Copilot, Claude, or Codex for remediation. The selected agent analyzes the advisory, opens a draft pull request, and tries to fix test failures introduced by the dependency update.
Claude said on April 9, 2026 that the advisor strategy is now in beta on Claude Platform. The new tool lets Sonnet or Haiku call Opus for planning help inside a single Messages API request, which Anthropic says raised SWE-bench Multilingual by 2.7 points while cutting cost per task by 11.9% versus Sonnet alone.