Reddit picks up Netflix’s VOID video object-deletion model
Original: Netflix releases Void a video model that can remove objects from video and their physical interactions on the scene View original →
A Reddit thread in /r/singularity surfaced Netflix’s new VOID repository to a broader AI audience. At crawl time the post had 198 upvotes and 29 comments. The repository packages the code, weight links, demo, Colab notebook, and arXiv paper for a research system focused on video object and interaction deletion.
The key claim is stronger than ordinary inpainting. VOID is designed to remove an object from a video and also remove the interactions that object caused in the scene. The repository’s example is direct: if a person holding a guitar is erased, VOID is meant to remove the person’s effect on the guitar as well so the instrument falls naturally instead of freezing in place. That is why the project matters. A lot of consumer video editing tools can clean pixels; far fewer try to repair causal consequences across time.
Technically, Netflix says VOID is built on top of CogVideoX and fine-tuned for video inpainting with interaction-aware mask conditioning. The repo describes a two-stage transformer setup. Pass 1 is the base inpainting model. Pass 2 adds warped-noise refinement for better temporal consistency on longer clips. For mask generation, the pipeline uses Gemini via the Google AI API together with SAM2, which means the system is combining a video generator, segmentation, and reasoning about regions influenced by the removed object.
The release is unusually complete for an open research drop. Netflix links model weights on Hugging Face, a browser demo, and an open notebook. At the same time, the practical requirements are not trivial. The quick-start notebook notes that users need a GPU with 40GB or more of VRAM such as an A100, and the full setup is heavier if you want to run both inference passes and the mask pipeline yourself.
That trade-off is probably why the Reddit post resonated. VOID is not a lightweight creator tool yet, but it is a concrete example of open video-editing research moving from “erase an object” toward “repair the scene dynamics after removal.” For researchers and infrastructure teams, that is the more interesting technical milestone.
Related Articles
A March 2026 Hacker News post reached 252 points and 261 comments around George London’s argument that coding agents could make free software relevant again. The core claim is that agents turn source-code access from a symbolic programmer right into a practical capability for ordinary users who need software changed on their behalf.
Cohere has entered the speech stack race with Transcribe, a 2B Apache 2.0 ASR model for 14 languages. Open weights, Hugging Face distribution, and a claimed 5.42 average WER headline the release.
Together AI said on April 3, 2026 that Wan 2.7 from Alibaba Cloud is now available on its platform. The accompanying product post says text-to-video is live now, with image-to-video, reference-to-video, and video edit workflows rolling out on the same API, auth, and billing surface.
Comments (0)
No comments yet. Be the first to comment!