Cursor said on March 25, 2026 that cloud agents can now run on customer infrastructure while preserving the same agent harness and workflow experience. Cursor's product post says the generally available setup keeps code, tool execution, and build artifacts inside the customer's network while still giving agents isolated remote environments, multi-model support, and plugin/MCP extensibility.
LLM
RSS FeedAnthropicAI highlighted an Engineering Blog post on March 24, 2026 about using a multi-agent harness to keep Claude productive across frontend and long-running software engineering tasks. The underlying Anthropic post explains how initializer agents, incremental coding sessions, progress logs, structured feature lists, and browser-based testing can reduce context-window drift and premature task completion.
A popular r/LocalLLaMA post revived attention around Google Research’s TurboQuant by tying it directly to local inference constraints. The method’s reported 3-bit KV cache compression and 6x memory reduction make it relevant well beyond research headlines, but its practical value will depend on whether it reaches real deployment stacks.
A post on r/MachineLearning argues that LoCoMo’s leaderboard is being treated with more confidence than its evaluation setup deserves. The audit claims the benchmark has a 6.4% ground-truth error rate and that its judge accepts intentionally wrong but topically adjacent answers far too often, turning attention from raw scores to benchmark reliability.
A Hacker News post pushed ATLAS into the spotlight by framing a consumer-GPU coding agent as a serious cost challenger to hosted systems. The headline benchmark is interesting, but the repository itself makes clear that its 74.6% result is not a controlled head-to-head against Claude 4.5 Sonnet because the task counts and evaluation protocols differ.
A Hacker News discussion around the `.claude` folder guide frames Claude Code configuration as versioned project infrastructure rather than repeated prompt setup. The breakdown of `CLAUDE.md`, rules, commands, skills, and agents shows how teams can standardize workflows, but it also creates a new governance layer for instructions.
A March 26, 2026 r/LocalLLaMA post about serving Qwen 3.5 27B on Google Cloud B200 clusters reached 205 points and 52 comments at crawl time. The linked write-up reports 1,103,941 total tokens per second on 12 nodes after switching from tensor to data parallelism, shrinking context length, enabling FP8 KV cache, and using MTP-1 speculative decoding.
A March 26, 2026 r/LocalLLaMA post linking NVIDIA's `gpt-oss-puzzle-88B` model card reached 284 points and 105 comments at crawl time. NVIDIA says the 88B MoE model uses its Puzzle post-training NAS pipeline to cut parameters and KV-cache costs while keeping reasoning accuracy near or above the parent model.
A March 27, 2026 Hacker News post linking Claude Code's new scheduling docs reached 282 points and 230 comments at crawl time. Anthropic says scheduled tasks run on Anthropic-managed infrastructure, can clone GitHub repos into fresh sessions, and are available to Pro, Max, Team, and Enterprise users.
Together Research said on March 27, 2026 that a smaller model using divide-and-conquer can match or outperform GPT-4o on long-context tasks, with the work accepted at ICLR 2026. Together's blog and the arXiv paper say the method uses a planner-worker-manager pipeline and explains long-context failures in terms of task, model, and aggregator noise.
OpenAI Devs said on March 26, 2026 that plugins are rolling out in Codex, letting the agent work with common tools such as Slack, Figma, Notion, and Gmail. OpenAI's Codex docs describe plugins as reusable bundles that package skills, app integrations, and MCP server settings, turning Codex into a more shareable workflow layer for teams.
A LocalLLaMA self-post shared an open-source TurboQuant implementation for llama.cpp that skips value dequantization when attention weights are negligible. The author reports a 22.8% decode gain at 32K context on Qwen3.5-35B-A3B over Apple M5 Max, with unchanged perplexity and better needle-in-a-haystack retrieval.