Document poisoning in RAG systems shows why ingestion controls matter more than output filters
Original: Document poisoning in RAG systems: How attackers corrupt AI's sources View original →
Hacker News picked up Amine Raji’s local lab showing how document poisoning works in a small ChromaDB-based RAG stack. Using LM Studio, Qwen2.5-7B-Instruct, and five clean documents, he injected three fake “corrections” into the corpus and then asked a normal question about company finances. The system started reporting fabricated numbers like $8.3M revenue and restructuring plans even though the genuine $24.7M Q4 report was still in the collection.
The technical point is that the attack does not need to jailbreak the model or rewrite the prompt. It wins earlier, at retrieval and source weighting. The poisoned documents reuse the same financial vocabulary as the legitimate record but wrap it in authority language such as “CFO-approved correction” and “board update.” In the author’s measurements the attack succeeded on 19 of 20 runs, and the post explicitly notes that the five-document corpus is a best case for the attacker rather than a realistic enterprise-scale benchmark.
The defensive result that got the most attention was embedding anomaly detection at ingestion. Raji says prompt hardening and output monitoring helped only modestly, while checking new documents for unusually dense semantic overlap cut standalone attack success from 95% to 20%. With five layers combined, residual success fell to 10%. That shifts the security conversation away from “can the model refuse a bad answer?” toward “who can write into the knowledge base and how are those writes screened?”
HN commenters pushed on the assumptions. Several noted that write access to the corpus is already privileged, and others argued that better citation UX or stricter provenance could make this class of failure easier to detect. Even with that caveat, the thread lands on a useful conclusion: once RAG systems connect Confluence, Slack, SharePoint, or internal document pipelines, ingestion becomes an attack surface of its own. Original source: Amine Raji. Community discussion: Hacker News.
Related Articles
OpenAI Developers said on March 6, 2026 that Codex Security is now in research preview. The product connects to GitHub repositories, builds a threat model, validates potential issues in isolation, and proposes patches for human review.
A Launch HN thread pulled RunAnywhere’s MetalRT and RCLI into focus, centering attention on a low-latency STT-LLM-TTS stack that runs on Apple Silicon without cloud APIs.
A Launch HN thread pushed RunAnywhere's RCLI into view as an Apple Silicon-first macOS voice AI stack that combines STT, LLM, TTS, local RAG, and 38 system actions without relying on cloud APIs.
Comments (0)
No comments yet. Be the first to comment!