LocalLLaMA Surfaces SentrySearch, a Local Qwen3-VL Workflow for Semantic Video Search

A r/LocalLLaMA post on March 30, 2026 highlighted a practical local multimodal workflow: semantic video search built on Qwen3-VL-Embedding. By March 31, the post had 301 points and 45 comments, suggesting genuine developer interest rather than a one-off demo.

How the project works

In the Reddit post, the author says SentrySearch embeds raw video chunks directly into the same vector space as text queries. The pipeline splits MP4 footage into overlapping chunks, stores embeddings in ChromaDB, and trims the best-matching segment automatically when a query hits. The author says the local backend uses Qwen3-VL-Embedding instead of cloud APIs, so the system can run without transcription, frame captioning, or a text-only middle layer.

The Reddit post says the local 8B model produced usable results on Apple Silicon and CUDA.
The same post estimates about 18 GB RAM for the 8B model and about 6 GB for the 2B model.
The project README says the local path auto-selects Qwen 8B on Macs with 24 GB or more RAM and falls back to Qwen 2B on smaller systems.

The README adds more engineering detail. Video is downscaled to 480p at 5 fps before embedding, capped at 32 sampled frames per chunk, and stored with a truncated 768-dimensional representation to reduce storage and similarity-search cost. The project also keeps backend-specific indexes separate, which matters because embeddings from different models are not compatible.

Why the community cares

The main appeal is not just that AI can search video. It is that a multimodal model can compare raw video and text directly enough to support offline workflows, private footage, and predictable latency on consumer hardware. That is a different proposition from caption-then-search pipelines.

For LocalLLaMA users, the post is a concrete example of Qwen3-VL being used as infrastructure instead of as a chat demo. The community source is the Reddit thread; the primary project source is the SentrySearch repository.

LocalLLaMA Surfaces SentrySearch, a Local Qwen3-VL Workflow for Semantic Video Search

How the project works

Why the community cares

Related Articles

Discontinued Intel Optane Memory Runs 1 Trillion Parameter LLM Locally at 4 Tokens/Sec

TextGen Becomes a Native Desktop App: Open-Source LM Studio Alternative Evolves

110 tok/s on a 35B Model with 12GB VRAM Using ik_llama.cpp

Related Articles

Discontinued Intel Optane Memory Runs 1 Trillion Parameter LLM Locally at 4 Tokens/Sec
LLM Reddit May 12, 2026 1 min read

TextGen Becomes a Native Desktop App: Open-Source LM Studio Alternative Evolves
LLM Reddit May 14, 2026 1 min read

110 tok/s on a 35B Model with 12GB VRAM Using ik_llama.cpp
LLM Reddit May 22, 2026 1 min read