Cursor said on March 26, 2026 that real-time reinforcement learning lets it ship improved Composer 2 checkpoints every five hours. Cursor’s March 27 technical report says the model combines continued pretraining on Kimi K2.5 with large-scale RL in realistic Cursor sessions, scores 61.3 on CursorBench, and runs on an asynchronous multi-region RL stack with large sandbox fleets.
LLM
RSS FeedReddit picked up Google’s Gemma 4 edge rollout, focusing on Agent Skills in Google AI Edge Gallery and the LiteRT-LM runtime. The main claims are sub-1.5GB memory, a 128K context window, and published benchmarks on Raspberry Pi 5 and Qualcomm NPUs.
A LocalLLaMA thread highlighted Gemma 4 31B's unexpectedly strong FoodTruck Bench showing, and the discussion quickly turned to long-horizon planning quality and benchmark reliability.
Anthropic's new interpretability paper argues that emotion-related internal representations in Claude Sonnet 4.5 causally shape behavior, especially under stress.
A high-ranking Hacker News thread amplified Apple's paper on simple self-distillation for code generation, a training recipe that improves pass@1 without verifier models or reinforcement learning.
GitHub said on April 3, 2026 that developers can now build with the GitHub Copilot SDK in public preview. GitHub’s changelog says the SDK exposes the same agent runtime behind Copilot cloud agent and Copilot CLI, with support for custom tools, streaming, permissions, and BYOK across five languages.
Anthropic said on April 3, 2026 that its Fellows program had produced a new method for surfacing behavioral differences between AI models. The accompanying research frames the tool as a high-recall screening method for finding novel model-specific behaviors that standard benchmarks may miss.
OpenAIDevs said on April 4, 2026 that developers can move from project setup to deployment with the Vercel plugin in the Codex app. The post aligns with OpenAI’s Codex plugin documentation and Vercel’s late-March rollout of plugin support for OpenAI Codex and Codex CLI.
A post in r/artificial pointed readers to Google DeepMind's Gemma 4 release, which packages advanced reasoning and agentic features under Apache 2.0. Google says the family spans four sizes, supports up to 256K context in larger models, and ships with day-one ecosystem support from Hugging Face to llama.cpp.
A Hacker News discussion surfaced a new paper showing that a model can improve coding performance by training on its own sampled answers. The authors report Qwen3-30B-Instruct rising from 42.4% to 55.3% pass@1 on LiveCodeBench v6 without a verifier, a teacher model, or reinforcement learning.
Anthropic said on February 23, 2026 that DeepSeek, Moonshot AI, and MiniMax carried out industrial-scale distillation attacks against Claude. The company framed model-output extraction as a security and platform integrity problem, not just a competitive concern.
In a March 29, 2026 X post, OpenAI Developers introduced Codex Security, a research preview aimed at identifying, validating, and remediating software vulnerabilities. The launch extends AI coding assistance into application security workflows.