LLM

LLM X/Twitter Apr 5, 2026 2 min read

Cursor details Composer 2’s training stack, from continued pretraining to real-world RL

Cursor said on March 26, 2026 that real-time reinforcement learning lets it ship improved Composer 2 checkpoints every five hours. Cursor’s March 27 technical report says the model combines continued pretraining on Kimi K2.5 with large-scale RL in realistic Cursor sessions, scores 61.3 on CursorBench, and runs on an asynchronous multi-region RL stack with large sandbox fleets.

#cursor #composer-2 #reinforcement-learning

LLM Reddit Apr 5, 2026 2 min read

Reddit highlights Gemma 4’s on-device Agent Skills push

Reddit picked up Google’s Gemma 4 edge rollout, focusing on Agent Skills in Google AI Edge Gallery and the LiteRT-LM runtime. The main claims are sub-1.5GB memory, a 128K context window, and published benchmarks on Raspberry Pi 5 and Qualcomm NPUs.

#gemma #edge-ai #on-device

LLM Reddit Apr 5, 2026 2 min read

LocalLLaMA debates Gemma 4 31B's surprising FoodTruck Bench result

A LocalLLaMA thread highlighted Gemma 4 31B's unexpectedly strong FoodTruck Bench showing, and the discussion quickly turned to long-horizon planning quality and benchmark reliability.

#llm #gemma #benchmarks

LLM Hacker News Apr 5, 2026 2 min read

HN discusses Anthropic's claim that emotion concepts inside an LLM can shape behavior

Anthropic's new interpretability paper argues that emotion-related internal representations in Claude Sonnet 4.5 causally shape behavior, especially under stress.

#llm #interpretability #anthropic

LLM Hacker News Apr 5, 2026 2 min read

HN thread spotlights a simple self-distillation recipe for stronger code generation

A high-ranking Hacker News thread amplified Apple's paper on simple self-distillation for code generation, a training recipe that improves pass@1 without verifier models or reinforcement learning.

#llm #code-generation #self-distillation

LLM X/Twitter Apr 4, 2026 2 min read

GitHub pushes Copilot SDK into public preview for custom agent apps

GitHub said on April 3, 2026 that developers can now build with the GitHub Copilot SDK in public preview. GitHub’s changelog says the SDK exposes the same agent runtime behind Copilot cloud agent and Copilot CLI, with support for custom tools, streaming, permissions, and BYOK across five languages.

#github #copilot-sdk #agents

LLM X/Twitter Apr 4, 2026 2 min read

Anthropic introduces a “diff” tool for spotting behavioral differences across AI models

Anthropic said on April 3, 2026 that its Fellows program had produced a new method for surfacing behavioral differences between AI models. The accompanying research frames the tool as a high-recall screening method for finding novel model-specific behaviors that standard benchmarks may miss.

#anthropic #model-diffing #ai-safety

LLM X/Twitter Apr 4, 2026 2 min read

OpenAI showcases Vercel plugin workflows inside the Codex app

OpenAIDevs said on April 4, 2026 that developers can move from project setup to deployment with the Vercel plugin in the Codex app. The post aligns with OpenAI’s Codex plugin documentation and Vercel’s late-March rollout of plugin support for OpenAI Codex and Codex CLI.

#openai #codex #vercel

LLM Reddit Apr 4, 2026 2 min read

r/artificial Flags Gemma 4 as Google Expands Its Open-Weight Push

A post in r/artificial pointed readers to Google DeepMind's Gemma 4 release, which packages advanced reasoning and agentic features under Apache 2.0. Google says the family spans four sizes, supports up to 256K context in larger models, and ships with day-one ecosystem support from Hugging Face to llama.cpp.

#gemma #google-deepmind #open-weights

LLM Hacker News Apr 4, 2026 2 min read

Hacker News Spots a Low-Cost Route to Better Code Models

A Hacker News discussion surfaced a new paper showing that a model can improve coding performance by training on its own sampled answers. The authors report Qwen3-30B-Instruct rising from 42.4% to 55.3% pass@1 on LiveCodeBench v6 without a verifier, a teacher model, or reinforcement learning.

#llm #codegen #self-distillation

LLM X/Twitter Apr 4, 2026 1 min read

Anthropic Claims Large-Scale Distillation Attacks on Claude Involved 24,000 Accounts and 16 Million Exchanges

Anthropic said on February 23, 2026 that DeepSeek, Moonshot AI, and MiniMax carried out industrial-scale distillation attacks against Claude. The company framed model-output extraction as a security and platform integrity problem, not just a competitive concern.

#model-distillation #ai-security #claude

LLM X/Twitter Apr 4, 2026 1 min read

OpenAI Previews Codex Security for Finding, Validating, and Fixing Vulnerabilities

In a March 29, 2026 X post, OpenAI Developers introduced Codex Security, a research preview aimed at identifying, validating, and remediating software vulnerabilities. The launch extends AI coding assistance into application security workflows.

#codex-security #application-security #agentic-coding