r/LocalLLaMA Highlights Netflix's Open VOID Video Deletion Model

A r/LocalLLaMA post surged past 1,100 upvotes after pointing to Netflix's first public model release on Hugging Face: VOID, short for Video Object and Interaction Deletion. What made the post stand out was not just that another company published weights, but that the model targets a harder version of video inpainting. According to the model card and the GitHub repo, VOID tries to remove an object and the physical interactions it induces in the scene, not just obvious traces such as shadows or reflections.

The published materials describe VOID as a fine-tuned system built on CogVideoX-Fun-V1.5-5b-InP. It uses interaction-aware quadmask conditioning, where different mask values represent the primary object to remove, overlap regions, affected regions, and protected background. Netflix says the model can handle cases where deleting a person should also change what happens to nearby objects, such as a guitar that would naturally fall once the person holding it disappears.

The base architecture is a 5B CogVideoX 3D Transformer.
The default output resolution is 384x672, with support for up to 197 frames.
Pass 1 is a base inpainting model, while Pass 2 refines temporal consistency with warped-noise initialization.
The quick-start notebook requires a GPU with 40GB+ VRAM such as an A100.

The repo also makes the workflow unusually concrete for an open release. The README documents the CLI, expected input layout, optional two-pass inference, and a mask-generation pipeline that combines SAM2 with Gemini to build quadmasks from raw video. Training details are also public: the authors say VOID was trained on paired counterfactual videos from HUMOTO and Kubric and run on 8x A100 80GB GPUs with DeepSpeed ZeRO Stage 2.

The Reddit discussion was enthusiastic for a reason. One widely upvoted reply highlighted the claim that VOID handles physical interactions, calling that especially impressive. Another commenter joked that Netflix is acting more open source than some frontier-model labs. That mix of novelty and reproducibility is why the post landed so well in r/LocalLLaMA: it is not just a flashy demo, but a release with weights, code, a notebook, and enough system detail for people to test the claim themselves.

r/LocalLLaMA Highlights Netflix's Open VOID Video Deletion Model

Related Articles

NVIDIA DeepStream 9.1 adds 13 agent skills for video AI

Kimi’s rise puts Chinese open-weight models back in Washington’s sights

Databricks ties Genie One, ZeroOps, LTAP and Unity AI Gateway into one agent stack