A fast-rising LocalLLaMA post resurfaced David Noel Ng's write-up on duplicating a seven-layer block inside Qwen2-72B, a no-training architecture tweak that reportedly lifted multiple Open LLM Leaderboard benchmarks.
LLM
A prominent r/MachineLearning thread highlighted arXiv 2603.01919, which audits shadow APIs claiming GPT-5 and Gemini-2.5 access and reports large performance drift, unstable safety behavior, and frequent identity-verification failures.
A Launch HN thread pushed RunAnywhere's RCLI into view as an Apple Silicon-first macOS voice AI stack that combines STT, LLM, TTS, local RAG, and 38 system actions without relying on cloud APIs.
Google DeepMind said Gemini 3.1 Flash-Lite is rolling out in preview through the Gemini API and Google AI Studio. The company positioned it as the most cost-efficient Gemini 3 model, with lower price, faster performance, and tunable thinking levels.
Claude said Claude Code now includes Code Review, a feature that dispatches multiple agents on every pull request. Anthropic says the feature is in research preview for Team and Enterprise, with depth-first reviews rather than lightweight skims.
A LocalLLaMA post pointed to a new Hugging Face dataset of human-written code reviews, pairing before-and-after code changes with inline reviewer comments and negative examples across 37 languages.
A Reddit post drew attention to a March 2 case study arguing that OpenClaw incidents already trigger 8 of 10 OWASP Agentic vulnerability classes, including malicious skill supply-chain attacks and localhost WebSocket hijacking.
Perplexity’s Computer account used X on March 9, 2026 to demonstrate Claude Code and GitHub CLI running directly inside Perplexity Computer. In the public demo, the system forked an Openclaw repository, planned a fix, implemented the change, and submitted a pull request from inside the Computer environment.
GitHub used X on March 9, 2026 to resurface its guide to building reliable multi-agent systems. The company argues that most failures come from missing structure, and recommends typed schemas, action schemas, and Model Context Protocol as the core engineering controls.
A high-scoring LocalLLaMA post says Qwen 3.5 9B on a 16GB M1 Pro handled memory recall and basic tool calling well enough for real agent work, even though creative reasoning still trailed frontier models.
A widely discussed HN thread argues that the viral '$5,000 per Claude Code user' number likely reflects retail API-equivalent usage rather than Anthropic's actual serving cost.
Google said on February 24, 2026 that it is rolling out a new agent step in Opal for all users. The feature lets Opal choose the right tools and models for a goal, adds Memory across sessions, and pushes the product from static workflow wiring toward more interactive agentic workflows.