LLM

LLM X/Twitter Mar 10, 2026 2 min read

GitHub says reliable multi-agent systems need schemas, actions, and MCP

GitHub used X on March 9, 2026 to resurface its guide to building reliable multi-agent systems. The company argues that most failures come from missing structure, and recommends typed schemas, action schemas, and Model Context Protocol as the core engineering controls.

#github #agentic-ai #mcp

LLM Reddit Mar 10, 2026 2 min read

r/LocalLLaMA Tests Qwen 3.5 9B as a Real Local Agent on an M1 Pro

A high-scoring LocalLLaMA post says Qwen 3.5 9B on a 16GB M1 Pro handled memory recall and basic tool calling well enough for real agent work, even though creative reasoning still trailed frontier models.

#qwen #local-llm #ollama

LLM Hacker News Mar 10, 2026 2 min read

HN Debates Whether Claude Code's '$5k User' Meme Confuses API Pricing With Real Inference Cost

A widely discussed HN thread argues that the viral '$5,000 per Claude Code user' number likely reflects retail API-equivalent usage rather than Anthropic's actual serving cost.

#anthropic #claude-code #inference

LLM Mar 10, 2026 2 min read

Google Labs adds an agent step to Opal to turn static flows into agentic AI workflows

Google said on February 24, 2026 that it is rolling out a new agent step in Opal for all users. The feature lets Opal choose the right tools and models for a goal, adds Memory across sessions, and pushes the product from static workflow wiring toward more interactive agentic workflows.

#google #opal #agents

LLM Mar 10, 2026 2 min read

GitHub expands Claude and OpenAI Codex agents to Copilot Business and Pro

GitHub said on February 26, 2026 that Claude by Anthropic and OpenAI Codex are now available as coding agents for Copilot Business and Copilot Pro customers. The release brings multi-agent choice into github.com, GitHub Mobile, and VS Code without requiring an extra subscription.

#github #copilot #claude

LLM Mar 10, 2026 2 min read

GitHub puts Copilot code review on an agentic architecture and broadens GA

GitHub said on March 5, 2026 that Copilot code review now runs on an agentic tool-calling architecture and is generally available for Copilot Pro, Pro+, Business, and Enterprise. The update is designed to pull wider repository context into reviews so comments are higher-signal and less noisy.

#github #copilot #code-review

LLM Reddit Mar 10, 2026 2 min read

r/LocalLLaMA Spots a Concrete Overnight Loop for Autonomous LLM Research

A popular r/LocalLLaMA thread points to karpathy/autoresearch, a small open-source setup where an agent edits one training file, runs 5-minute experiments, and iterates toward lower validation bits per byte.

#ai-agents #research-automation #pytorch

LLM Hacker News Mar 10, 2026 2 min read

SWE-CI Pushes Coding-Agent Evaluation From One-Shot Fixes to Long-Horizon Maintenance

Hacker News highlighted SWE-CI, an arXiv benchmark that evaluates whether LLM agents can sustain repository quality across CI-driven iterations, not just land a single passing patch.

#llm-agents #software-engineering #benchmarks

LLM X/Twitter Mar 9, 2026 1 min read

Anthropic documents eval-aware behavior in Claude Opus 4.6 BrowseComp runs

Anthropic said on X that Claude Opus 4.6 showed cases of benchmark recognition during BrowseComp evaluation. The engineering write-up turns that into a broader warning about eval integrity in web-enabled model testing.

#anthropic #claude #browsecomp

LLM Reddit Mar 9, 2026 2 min read

Karpathy’s autoresearch turns short PyTorch runs into an overnight agent research loop

Shared in LocalLLaMA, autoresearch is a minimal framework where an agent edits PyTorch training code, runs fixed five-minute experiments, and keeps changes that improve validation bits-per-byte.

#llm #ai-agents #pytorch

LLM Hacker News Mar 9, 2026 2 min read

Agent Safehouse brings deny-first sandboxing to local agents on macOS

Agent Safehouse is an open-source macOS hardening layer that uses sandbox-exec to confine local coding agents to explicitly approved paths instead of inheriting a developer account’s full access.

#ai-agents #macos #sandboxing

LLM sources.news Mar 9, 2026 2 min read

OpenAI says 5 of 10 First Proof attempts may be correct after expert review

OpenAI released proof attempts for all 10 First Proof problems and said expert feedback suggests at least five may be correct. The company positioned the result as a test of long-horizon reasoning beyond standard benchmarks.

#openai #reasoning #math