A Show HN post introduced Cog, a file-based memory architecture for Claude Code that stores working memory, archival context, and self-maintenance workflows as markdown and git-visible conventions. Instead of adding a database or runtime, it treats the filesystem, CLAUDE.md, and scheduled skills as the interface for persistent agent memory.
LLM
RSS FeedOpenAI says GPT-5.4 is its most capable and efficient frontier model for professional work, with stronger reasoning, coding, and computer use. The release spans ChatGPT, the API, and Codex, and pushes the context window to 1 million tokens.
Ollama said on March 26, 2026 that VS Code now integrates with Ollama via GitHub Copilot. Ollama docs say VS Code 1.113+, GitHub Copilot Chat 0.41.0+, and Ollama v0.18.3+ let users load local or cloud Ollama models into the Copilot model picker, with GitHub Copilot Free sufficient for custom model selection.
OpenAIDevs said on March 27, 2026 that Codex usage limits had been reset across plans so users could try newly launched plugins. OpenAI's Help Center says Codex is temporarily available on Free and Go, paid plans are getting 2x rate limits, and plugins package reusable workflows built from skills, app integrations, and MCP configurations.
David Noel Ng's follow-up post treats layer duplication as a search problem rather than a lucky trick, then ties it to multilingual hidden-state evidence that the middle of the network may host a shared reasoning space.
The Reddit thread focused on a practical claim with real systems implications: replace TurboQuant's dense rotation with structured rotor math, keep attention fidelity close, and make the kernel much cheaper on NVIDIA and Apple hardware.
Google introduced Gemini 3.1 Flash Live on Mar 26, 2026 as its new real-time audio model for developers, enterprises, and consumer products. The release ties together the Gemini Live API, Gemini Enterprise for Customer Experience, Search Live, and Gemini Live around a single lower-latency voice stack.
The LocalLLaMA thread climbed because it translated Intel workstation GPU news into the metrics local inference users actually watch: VRAM, bandwidth, software support, and cost-per-model.
A detailed engineering write-up resonated on Hacker News because it treated production RAG as a data and operations problem, not a prompt demo.
A FutureSearch incident transcript moved quickly through Hacker News because it showed, minute by minute, how a poisoned LiteLLM package reached a workstation and was isolated within 72 minutes.
GitHub now lets users mention <code>@copilot</code> in a pull request to request changes on that same PR. The company says Copilot coding agent handles the work in a cloud development environment, runs tests and linting, then pushes updates; pull requests from forks are not yet supported.
Vercel introduced a rebuilt v0 positioned for production apps and agents rather than demo-only prototyping. The release adds repo import into a sandbox runtime, git-native branch and pull-request workflows, secure Snowflake and AWS database integrations, and enterprise-grade security controls.