Hacker News highlights a practical video-search CLI built on Gemini Embedding 2
Original: Show HN: Gemini can now natively embed video, so I built sub-second video search View original →
A March 24, 2026 Show HN post drew attention because it demonstrated one of the more concrete uses of multimodal embeddings: searching raw video without first turning the footage into text. The project, SentrySearch, packages that idea into a local CLI aimed at dashcam and security video.
According to the repository and the HN write-up, the tool splits footage into overlapping chunks, embeds each chunk directly as video with Gemini Embedding 2, stores the resulting vectors in a local ChromaDB index, and matches a natural-language query against that same embedding space. The top result can then be auto-trimmed back into a clip.
- No transcription or frame-captioning stage is required before search.
- Default indexing uses 30-second chunks with overlap, plus still-frame skipping so long idle segments do not always get embedded.
- The project estimates about $2.50 per hour of footage at default settings, with lower real costs when parked or low-motion footage can be skipped.
That matters because video search projects often fail at the operational layer. They either require a hosted product, a heavyweight vision stack, or manual labeling. SentrySearch instead treats Gemini's video embedding endpoint as an infrastructure primitive and wraps it in something a developer can run locally with Python, ffmpeg, and an API key.
The limitations are also spelled out. Chunk boundaries can miss events that span multiple segments, still-frame detection is heuristic, and Gemini Embedding 2 is still in preview, so both behavior and pricing can move. Even with those caveats, the HN post landed because it translated a new model capability into a workflow that feels immediately usable rather than purely demo-oriented.
Primary source: SentrySearch repository. Community source: Hacker News discussion.
Related Articles
Google AI Studio said in a March 19, 2026 post on X that its vibe coding workflow now supports multiplayer collaboration, live data connections, persistent builds, and shadcn, Framer Motion, and npm support. The update pushes AI Studio closer to a browser-based app-building environment instead of a prompt-only prototype tool.
Google AI Studio promoted Gemini Embedding 2 in a March 12, 2026 X post, and Google’s March 10 blog post says the model maps text, images, video, audio, and documents into a single embedding space. Google says it is in public preview through the Gemini API and Vertex AI and is designed for multimodal retrieval and classification.
Google has put Gemini Embedding 2 into public preview through the Gemini API and Vertex AI. The model is Google’s first natively multimodal embedding system, combining text, image, video, audio, and document inputs in one embedding space.
Comments (0)
No comments yet. Be the first to comment!