LLM

LLM Hacker News Mar 27, 2026 2 min read

Hacker News Highlights Cog, a Plain-Text Memory Architecture for Claude Code

A Show HN post introduced Cog, a file-based memory architecture for Claude Code that stores working memory, archival context, and self-maintenance workflows as markdown and git-visible conventions. Instead of adding a database or runtime, it treats the filesystem, CLAUDE.md, and scheduled skills as the interface for persistent agent memory.

#claude-code #agent-memory #plain-text

LLM Mar 27, 2026 1 min read

OpenAI launches GPT-5.4 across ChatGPT, API, and Codex

OpenAI says GPT-5.4 is its most capable and efficient frontier model for professional work, with stronger reasoning, coding, and computer use. The release spans ChatGPT, the API, and Codex, and pushes the context window to 1 million tokens.

#openai #gpt-5.4 #codex

LLM X/Twitter Mar 27, 2026 1 min read

Ollama models arrive in VS Code's GitHub Copilot Chat picker

Ollama said on March 26, 2026 that VS Code now integrates with Ollama via GitHub Copilot. Ollama docs say VS Code 1.113+, GitHub Copilot Chat 0.41.0+, and Ollama v0.18.3+ let users load local or cloud Ollama models into the Copilot model picker, with GitHub Copilot Free sufficient for custom model selection.

#ollama #github-copilot #vscode

LLM X/Twitter Mar 27, 2026 2 min read

OpenAI expands Codex access and turns plugins into reusable workflow packages

OpenAIDevs said on March 27, 2026 that Codex usage limits had been reset across plans so users could try newly launched plugins. OpenAI's Help Center says Codex is temporarily available on Free and Go, paid plans are getting 2x rate limits, and plugins package reusable workflows built from skills, app integrations, and MCP configurations.

#openai #codex #plugins

LLM Reddit Mar 27, 2026 2 min read

LocalLLaMA Revisits RYS on Qwen3.5 and the Case for a Shared Reasoning Space

David Noel Ng's follow-up post treats layer duplication as a search problem rather than a lucky trick, then ties it to multilingual hidden-state evidence that the middle of the network may host a shared reasoning space.

#qwen #transformers #relayering

LLM Reddit Mar 27, 2026 2 min read

LocalLLaMA Debates RotorQuant as a Cheaper KV Cache Compression Path

The Reddit thread focused on a practical claim with real systems implications: replace TurboQuant's dense rotation with structured rotor math, keep attention fidelity close, and make the kernel much cheaper on NVIDIA and Apple hardware.

#rotorquant #quantization #kv-cache

LLM Mar 27, 2026 2 min read

Google ships Gemini 3.1 Flash Live for lower-latency voice agents and global Search Live

Google introduced Gemini 3.1 Flash Live on Mar 26, 2026 as its new real-time audio model for developers, enterprises, and consumer products. The release ties together the Gemini Live API, Gemini Enterprise for Customer Experience, Search Live, and Gemini Live around a single lower-latency voice stack.

#google #gemini #voice-ai

LLM Reddit Mar 27, 2026 2 min read

Intel's Arc Pro B70 gives LocalLLaMA a new sub-$1,000 target for 32GB local inference

The LocalLLaMA thread climbed because it translated Intel workstation GPU news into the metrics local inference users actually watch: VRAM, bandwidth, software support, and cost-per-model.

#intel #gpu #vram

LLM Hacker News Mar 27, 2026 2 min read

Hacker News revisits what production RAG actually takes on local models

A detailed engineering write-up resonated on Hacker News because it treated production RAG as a data and operations problem, not a prompt demo.

#rag #llamaindex #chromadb

LLM Hacker News Mar 27, 2026 2 min read

Hacker News zeroes in on the LiteLLM supply-chain attack and the 72-minute response

A FutureSearch incident transcript moved quickly through Hacker News because it showed, minute by minute, how a poisoned LiteLLM package reached a workstation and was isolated within 72 minutes.

#litellm #security #pypi

LLM Mar 26, 2026 2 min read

GitHub lets @copilot edit existing pull requests from the comment thread

GitHub now lets users mention <code>@copilot</code> in a pull request to request changes on that same PR. The company says Copilot coding agent handles the work in a cloud development environment, runs tests and linting, then pushes updates; pull requests from forks are not yet supported.

#github #copilot #pull-requests

LLM Mar 26, 2026 2 min read

Vercel rebuilds v0 for production apps and agent workflows

Vercel introduced a rebuilt v0 positioned for production apps and agents rather than demo-only prototyping. The release adds repo import into a sandbox runtime, git-native branch and pull-request workflows, secure Snowflake and AWS database integrations, and enterprise-grade security controls.

#vercel #v0 #ai-coding