Miasma Climbs Hacker News with a Rust Trap for AI Web Scrapers
Original: Miasma: A tool to trap AI web scrapers in an endless poison pit View original →
A defensive answer to training-data scraping
A March 2026 Hacker News post about Miasma reached 187 points and 136 comments at crawl time. The idea is intentionally adversarial. Instead of merely blocking unwanted AI scrapers, Miasma lets publishers divert suspicious bot traffic into a separate server that serves poisoned training data and recursive links. The project is written in Rust and presents itself as a fast, low-footprint way to push back against large-scale web harvesting.
The README frames the tool around a complaint that has become common among independent site operators: AI companies scrape public websites at enormous scale, often without meaningful consent or compensation. Miasma’s answer is not a classic CAPTCHA or rate limit. It is a decoy environment. A scraper that follows hidden links lands in an endless loop of synthetic material rather than the real pages the operator wants to protect.
How the trap works in practice
The documented deployment pattern is straightforward. A site owner embeds hidden links that point to a path such as /bots. Human visitors do not see those links because they are hidden with CSS and accessibility attributes, but automated scrapers still discover them. An Nginx reverse proxy then routes requests under that path to a running Miasma instance.
Once a bot enters the trap, Miasma serves poisoned content from an upstream source and multiple self-directing links so the crawler keeps walking in circles. The project’s example starts the service with miasma --link-prefix '/bots' -p 9855 -c 50. The README says that at 50 max in-flight requests the server should peak around 50 to 60 MB of memory, and anything above the limit gets a 429 response instead of being queued. That operational detail matters: the tool is meant to be irritating for bots without becoming expensive for the site owner.
Where the engineering tradeoffs really are
Miasma also exposes knobs that show this is more than a meme project. Operators can control the link prefix, the number of recursive links, whether responses are force-gzipped to reduce egress, and which poison source is used upstream. The README also tells users to protect legitimate search engines and friendly bots with a carefully written robots.txt, which is crucial if the goal is to target exploitative crawlers rather than break normal discovery.
The broader significance is that anti-scraping is moving from passive defense toward active cost shifting. Miasma will not solve the underlying policy dispute around AI training data, but it gives smaller publishers an engineering tool that fits directly into an existing reverse-proxy stack. Hacker News attention here reflects a wider mood on the open web: site operators increasingly want something stronger than polite exclusion, but still lightweight enough to deploy without rebuilding their infrastructure.
Primary source: Miasma. Community discussion: Hacker News.
Related Articles
A post on r/LocalLLaMA highlighted Kreuzberg v4.5, a Rust-based document intelligence framework that now adds stronger layout and table understanding. The release claims Docling-level quality with lower memory overhead and materially faster processing.
A widely discussed Hacker News thread surfaced a Rust community summary that sees AI as useful for search, review assistance, and tedious semi-structured work, but risky for learning, subtle defects, ethics, power use, and vendor concentration.
A high-engagement r/MachineLearning discussion introduced IronClaw, a Rust-based AI agent runtime designed around sandboxed tool execution, encrypted credential handling, and database-backed policy controls. The post landed because it treats agent security as a systems problem instead of a prompt-only problem.
Comments (0)
No comments yet. Be the first to comment!