LocalLLaMA Surfaces SentrySearch, a Local Qwen3-VL Workflow for Semantic Video Search
Original: Semantic video search using local Qwen3-VL embedding, no API, no transcription View original →
A r/LocalLLaMA post on March 30, 2026 highlighted a practical local multimodal workflow: semantic video search built on Qwen3-VL-Embedding. By March 31, the post had 301 points and 45 comments, suggesting genuine developer interest rather than a one-off demo.
How the project works
In the Reddit post, the author says SentrySearch embeds raw video chunks directly into the same vector space as text queries. The pipeline splits MP4 footage into overlapping chunks, stores embeddings in ChromaDB, and trims the best-matching segment automatically when a query hits. The author says the local backend uses Qwen3-VL-Embedding instead of cloud APIs, so the system can run without transcription, frame captioning, or a text-only middle layer.
- The Reddit post says the local 8B model produced usable results on Apple Silicon and CUDA.
- The same post estimates about 18 GB RAM for the 8B model and about 6 GB for the 2B model.
- The project README says the local path auto-selects Qwen 8B on Macs with 24 GB or more RAM and falls back to Qwen 2B on smaller systems.
The README adds more engineering detail. Video is downscaled to 480p at 5 fps before embedding, capped at 32 sampled frames per chunk, and stored with a truncated 768-dimensional representation to reduce storage and similarity-search cost. The project also keeps backend-specific indexes separate, which matters because embeddings from different models are not compatible.
Why the community cares
The main appeal is not just that AI can search video. It is that a multimodal model can compare raw video and text directly enough to support offline workflows, private footage, and predictable latency on consumer hardware. That is a different proposition from caption-then-search pipelines.
For LocalLLaMA users, the post is a concrete example of Qwen3-VL being used as infrastructure instead of as a chat demo. The community source is the Reddit thread; the primary project source is the SentrySearch repository.
Related Articles
Hacker News pushed Ente's Ensu announcement because it treats local LLM software as a privacy and ownership product: offline chat across major platforms, open source core logic, and planned encrypted sync.
Show HN users were drawn to SentrySearch because it turns Gemini Embedding 2's native video embeddings into a practical CLI for semantic search and clip extraction.
A LocalLLaMA thread about Intel’s Arc Pro B70 and B65 reached 213 upvotes and 133 comments. Intel says the B70 is available from March 25, 2026 with a suggested starting price of $949, while the B65 follows in mid-April.
Comments (0)
No comments yet. Be the first to comment!