Hacker News Picks Up Karpathy's "LLM Wiki" Pattern for Persistent Knowledge Bases
Original: LLM Wiki – example of an "idea file" View original →
From query-time retrieval to a maintained knowledge layer
Andrej Karpathy’s LLM Wiki, posted on April 4, 2026, starts from a simple complaint: most document workflows still look like RAG. You upload files, the model retrieves relevant chunks at query time, and then rebuilds the answer from scratch every time. At crawl time, the Hacker News thread around the gist had 274 points and 89 comments, with readers treating it less as a note-taking trick and more as an architectural pattern for agent workflows.
The gist argues for a different intermediate layer. Instead of repeatedly retrieving from raw documents, an LLM should incrementally build and maintain a persistent wiki made of interlinked markdown pages. When a new source arrives, the agent should not merely index it. It should update topic summaries, revise entity pages, flag contradictions, add cross-links, and strengthen the running synthesis. In that model, the wiki becomes a compiled artifact that keeps getting better over time rather than a transient answer assembled on demand.
The three layers and the operating loop
Karpathy frames the system in three layers. The first is raw sources: immutable articles, papers, transcripts, images, or datasets that remain the ground truth. The second is the wiki, a directory of LLM-authored markdown pages containing summaries, concepts, entities, comparisons, and broader syntheses. The third is the schema, a rules document such as AGENTS.md or CLAUDE.md that tells the agent how the wiki should be structured and maintained.
On top of that, he describes three core operations. Ingest means reading a new source, discussing it, writing a summary, updating the index, touching related pages, and appending to the log. Query means answering questions against the wiki itself, then optionally filing the resulting analysis back into the knowledge base as a new page. Lint means periodically checking for contradictions, stale claims, orphan pages, weak cross-references, or missing concepts. Two special files, index.md and log.md, help navigation by separating the content-oriented map of the wiki from the chronological record of how it evolved.
Why the idea resonates
The practical appeal is that it recasts the LLM as a maintenance engine rather than only a retrieval layer. Karpathy explicitly describes Obsidian as the IDE, the LLM as the programmer, and the wiki as the codebase. That analogy lands because the tedious part of knowledge management is not thinking. It is cross-linking pages, updating summaries, tracking contradictions, and keeping structure coherent across dozens or hundreds of files. Those are exactly the repetitive bookkeeping tasks that humans avoid and LLM agents can absorb.
The gist stays intentionally abstract, which is part of why Hacker News responded to it. It is not pitching one fixed implementation. It is a pattern that can fit personal research, reading notes, due diligence, internal team wikis, or long-running hobby projects. The underlying bet is that once maintenance becomes cheap enough, a wiki can stop being a graveyard of abandoned notes and become a living interface between the user and their accumulated sources.
Sources: Karpathy gist, Hacker News discussion
Related Articles
Google DeepMind said on March 26, 2026 that Gemini 3.1 Flash Live is rolling out in Gemini Live and Google Search Live, while developers can access it through Google AI Studio. Google’s announcement positions 3.1 Flash Live as its highest-quality audio model, with lower latency, improved tonal understanding, and benchmark gains including 90.8% on ComplexFuncBench Audio.
A March 2026 r/LocalLLaMA post with 126 points and 45 comments highlighted a practical guide for running Qwen3.5-27B through llama.cpp and wiring it into OpenCode. The post stands out because it covers the operational details that usually break local coding setups: quant choice, chat-template fixes, VRAM budgeting, Tailscale networking, and tool-calling behavior.
A new r/LocalLLaMA benchmark post says an M5 Max system pushed Qwen3.5-397B to 20.34 tok/s through SSD streaming, with I/O parallelism, temporal expert prediction, and Q3-GGUF experts doing most of the work.
Comments (0)
No comments yet. Be the first to comment!