LocalLLaMA spotlights a 3.5M-patent search engine built with SQLite FTS5 and Nemotron 9B
Original: I classified 3.5M US patents with Nemotron 9B on a single RTX 5090 — then built a free search engine on top View original →
A well-received post in r/LocalLLaMA this weekend surfaced a practical build that takes a very different route from the usual “embeddings everywhere” search stack. The project is PatentLLM, a free search engine covering 3.5 million US patents from 2016 through 2025. Its creator, a patent lawyer who says he started coding in December 2025, put the entire corpus into a single 74GB SQLite database and then layered full-text search, local LLM classification, and server-side rendering on top.
The core technical decision is intentionally conservative. Instead of using vector search as the retrieval backbone, the project relies on SQLite FTS5 and BM25. The reasoning is domain-specific: patent attorneys often need exact phrase matching, not semantic neighbors that sound close but miss the legal language they are actually searching for. The write-up gives a concrete example: a query for “solid-state battery electrolyte” should return patents containing those precise terms, not a loose cloud of related energy-storage documents.
That does not mean the project ignores LLMs. Nemotron 9B runs locally on an RTX 5090 to classify all 3.5 million records into 100 tech tags, a job the author says took about 48 hours. A second local LLM layer expands natural-language queries into FTS5 boolean queries. On ranking, the system uses custom BM25 weights that heavily favor title matches, then assignee, then abstract, with claims weighted lower to avoid verbosity dominating relevance. According to the author, that weighting better matches how experienced patent searchers triage results.
The surrounding stack is almost as interesting as the retrieval method. The web app uses FastAPI and Jinja2, and the author says it is hosted through a Chromebook plus Cloudflare Tunnel. The pitch is clear: keep the operations surface small, avoid paid APIs, and let the data stay in a single portable file that can be copied or moved with ordinary tools. For a search product indexing millions of long technical documents, that is a meaningful statement about what “boring infrastructure” can still do.
Why did the LocalLLaMA thread resonate? Because it turns local-model enthusiasm into an end-to-end utility with a specific user need. The project uses an LLM where it adds leverage, but it does not force neural methods into the part of the stack where exact symbolic retrieval is stronger. That division of labor makes the build more credible than many generic “AI search” demos.
Community source: r/LocalLLaMA post
Original sources: technical write-up, live search engine
Related Articles
A well-received MachineLearning post introduced GoodSeed as a simpler experiment tracker that stores runs in local SQLite, serves them through a built-in web app, and optionally syncs to a remote API. The project also logs hardware metrics, stdout/stderr, Git state, and offers a migration path for Neptune users.
NVIDIA announced new AI Blueprint workflows for telecom on February 28, 2026, combining Nemotron reasoning models with NVIDIA NIM microservices. The company says early partners including Amdocs, BubbleRAN, and ServiceNow are applying the stack to network configuration and optimization.
NVIDIA’s February 18, 2026 update outlines how it is supporting IndiaAI Mission priorities through GPU infrastructure expansion, sovereign model development, and research/startup programs. The post ties government policy goals to specific cloud, model, and financing collaborations.
Comments (0)
No comments yet. Be the first to comment!