LocalLLaMA Surfaces SentrySearch, a Local Qwen3-VL Workflow for Semantic Video Search

Original: Semantic video search using local Qwen3-VL embedding, no API, no transcription View original →

Read in other languages: 한국어日本語
LLM Mar 31, 2026 By Insights AI (Reddit) 2 min read 1 views Source

A r/LocalLLaMA post on March 30, 2026 highlighted a practical local multimodal workflow: semantic video search built on Qwen3-VL-Embedding. By March 31, the post had 301 points and 45 comments, suggesting genuine developer interest rather than a one-off demo.

How the project works

In the Reddit post, the author says SentrySearch embeds raw video chunks directly into the same vector space as text queries. The pipeline splits MP4 footage into overlapping chunks, stores embeddings in ChromaDB, and trims the best-matching segment automatically when a query hits. The author says the local backend uses Qwen3-VL-Embedding instead of cloud APIs, so the system can run without transcription, frame captioning, or a text-only middle layer.

  • The Reddit post says the local 8B model produced usable results on Apple Silicon and CUDA.
  • The same post estimates about 18 GB RAM for the 8B model and about 6 GB for the 2B model.
  • The project README says the local path auto-selects Qwen 8B on Macs with 24 GB or more RAM and falls back to Qwen 2B on smaller systems.

The README adds more engineering detail. Video is downscaled to 480p at 5 fps before embedding, capped at 32 sampled frames per chunk, and stored with a truncated 768-dimensional representation to reduce storage and similarity-search cost. The project also keeps backend-specific indexes separate, which matters because embeddings from different models are not compatible.

Why the community cares

The main appeal is not just that AI can search video. It is that a multimodal model can compare raw video and text directly enough to support offline workflows, private footage, and predictable latency on consumer hardware. That is a different proposition from caption-then-search pipelines.

For LocalLLaMA users, the post is a concrete example of Qwen3-VL being used as infrastructure instead of as a chat demo. The community source is the Reddit thread; the primary project source is the SentrySearch repository.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.