A March 27, 2026 Hacker News post linking Claude Code's new scheduling docs reached 282 points and 230 comments at crawl time. Anthropic says scheduled tasks run on Anthropic-managed infrastructure, can clone GitHub repos into fresh sessions, and are available to Pro, Max, Team, and Enterprise users.
LLM
RSS FeedTogether Research said on March 27, 2026 that a smaller model using divide-and-conquer can match or outperform GPT-4o on long-context tasks, with the work accepted at ICLR 2026. Together's blog and the arXiv paper say the method uses a planner-worker-manager pipeline and explains long-context failures in terms of task, model, and aggregator noise.
OpenAI Devs said on March 26, 2026 that plugins are rolling out in Codex, letting the agent work with common tools such as Slack, Figma, Notion, and Gmail. OpenAI's Codex docs describe plugins as reusable bundles that package skills, app integrations, and MCP server settings, turning Codex into a more shareable workflow layer for teams.
A LocalLLaMA self-post shared an open-source TurboQuant implementation for llama.cpp that skips value dequantization when attention weights are negligible. The author reports a 22.8% decode gain at 32K context on Qwen3.5-35B-A3B over Apple M5 Max, with unchanged perplexity and better needle-in-a-haystack retrieval.
A Show HN post introduced Cog, a file-based memory architecture for Claude Code that stores working memory, archival context, and self-maintenance workflows as markdown and git-visible conventions. Instead of adding a database or runtime, it treats the filesystem, CLAUDE.md, and scheduled skills as the interface for persistent agent memory.
Ollama said on March 26, 2026 that VS Code now integrates with Ollama via GitHub Copilot. Ollama docs say VS Code 1.113+, GitHub Copilot Chat 0.41.0+, and Ollama v0.18.3+ let users load local or cloud Ollama models into the Copilot model picker, with GitHub Copilot Free sufficient for custom model selection.
OpenAIDevs said on March 27, 2026 that Codex usage limits had been reset across plans so users could try newly launched plugins. OpenAI's Help Center says Codex is temporarily available on Free and Go, paid plans are getting 2x rate limits, and plugins package reusable workflows built from skills, app integrations, and MCP configurations.
David Noel Ng's follow-up post treats layer duplication as a search problem rather than a lucky trick, then ties it to multilingual hidden-state evidence that the middle of the network may host a shared reasoning space.
The Reddit thread focused on a practical claim with real systems implications: replace TurboQuant's dense rotation with structured rotor math, keep attention fidelity close, and make the kernel much cheaper on NVIDIA and Apple hardware.
Google introduced Gemini 3.1 Flash Live on Mar 26, 2026 as its new real-time audio model for developers, enterprises, and consumer products. The release ties together the Gemini Live API, Gemini Enterprise for Customer Experience, Search Live, and Gemini Live around a single lower-latency voice stack.
The LocalLLaMA thread climbed because it translated Intel workstation GPU news into the metrics local inference users actually watch: VRAM, bandwidth, software support, and cost-per-model.
A detailed engineering write-up resonated on Hacker News because it treated production RAG as a data and operations problem, not a prompt demo.