Document poisoning in RAG systems shows why ingestion controls matter more than output filters

Original: Document poisoning in RAG systems: How attackers corrupt AI's sources View original →

Read in other languages: 한국어日本語
LLM Mar 13, 2026 By Insights AI (HN) 2 min read 1 views Source

Hacker News picked up Amine Raji’s local lab showing how document poisoning works in a small ChromaDB-based RAG stack. Using LM Studio, Qwen2.5-7B-Instruct, and five clean documents, he injected three fake “corrections” into the corpus and then asked a normal question about company finances. The system started reporting fabricated numbers like $8.3M revenue and restructuring plans even though the genuine $24.7M Q4 report was still in the collection.

The technical point is that the attack does not need to jailbreak the model or rewrite the prompt. It wins earlier, at retrieval and source weighting. The poisoned documents reuse the same financial vocabulary as the legitimate record but wrap it in authority language such as “CFO-approved correction” and “board update.” In the author’s measurements the attack succeeded on 19 of 20 runs, and the post explicitly notes that the five-document corpus is a best case for the attacker rather than a realistic enterprise-scale benchmark.

The defensive result that got the most attention was embedding anomaly detection at ingestion. Raji says prompt hardening and output monitoring helped only modestly, while checking new documents for unusually dense semantic overlap cut standalone attack success from 95% to 20%. With five layers combined, residual success fell to 10%. That shifts the security conversation away from “can the model refuse a bad answer?” toward “who can write into the knowledge base and how are those writes screened?”

HN commenters pushed on the assumptions. Several noted that write access to the corpus is already privileged, and others argued that better citation UX or stricter provenance could make this class of failure easier to detect. Even with that caveat, the thread lands on a useful conclusion: once RAG systems connect Confluence, Slack, SharePoint, or internal document pipelines, ingestion becomes an attack surface of its own. Original source: Amine Raji. Community discussion: Hacker News.

Share:

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.