LocalLLaMA spotlights a 3.5M-patent search engine built with SQLite FTS5 and Nemotron 9B

Original: I classified 3.5M US patents with Nemotron 9B on a single RTX 5090 — then built a free search engine on top View original →

Read in other languages: 한국어日本語
AI Mar 8, 2026 By Insights AI (Reddit) 2 min read 2 views Source

A well-received post in r/LocalLLaMA this weekend surfaced a practical build that takes a very different route from the usual “embeddings everywhere” search stack. The project is PatentLLM, a free search engine covering 3.5 million US patents from 2016 through 2025. Its creator, a patent lawyer who says he started coding in December 2025, put the entire corpus into a single 74GB SQLite database and then layered full-text search, local LLM classification, and server-side rendering on top.

The core technical decision is intentionally conservative. Instead of using vector search as the retrieval backbone, the project relies on SQLite FTS5 and BM25. The reasoning is domain-specific: patent attorneys often need exact phrase matching, not semantic neighbors that sound close but miss the legal language they are actually searching for. The write-up gives a concrete example: a query for “solid-state battery electrolyte” should return patents containing those precise terms, not a loose cloud of related energy-storage documents.

That does not mean the project ignores LLMs. Nemotron 9B runs locally on an RTX 5090 to classify all 3.5 million records into 100 tech tags, a job the author says took about 48 hours. A second local LLM layer expands natural-language queries into FTS5 boolean queries. On ranking, the system uses custom BM25 weights that heavily favor title matches, then assignee, then abstract, with claims weighted lower to avoid verbosity dominating relevance. According to the author, that weighting better matches how experienced patent searchers triage results.

The surrounding stack is almost as interesting as the retrieval method. The web app uses FastAPI and Jinja2, and the author says it is hosted through a Chromebook plus Cloudflare Tunnel. The pitch is clear: keep the operations surface small, avoid paid APIs, and let the data stay in a single portable file that can be copied or moved with ordinary tools. For a search product indexing millions of long technical documents, that is a meaningful statement about what “boring infrastructure” can still do.

Why did the LocalLLaMA thread resonate? Because it turns local-model enthusiasm into an end-to-end utility with a specific user need. The project uses an LLM where it adds leverage, but it does not force neural methods into the part of the stack where exact symbolic retrieval is stronger. That division of labor makes the build more credible than many generic “AI search” demos.

Community source: r/LocalLLaMA post
Original sources: technical write-up, live search engine

Share:

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.