Netflix VOID
Remove objects from video — shadows, reflections, and physics included
Netflix open-sourced VOID (Video Object and Interaction Deletion) on April 3, 2026 — an AI framework developed with INSAIT Sofia University that removes objects from videos while rewriting the downstream physical effects those objects created. Give it a video and a text description of what to remove; Gemini 3 Pro identifies affected scene areas, SAM2 segments the objects, and a fine-tuned CogVideoX diffusion model regenerates the scene. VOID was preferred in user studies 64.8% of the time vs. Runway (18.4%). Licensed Apache 2.0 with a Hugging Face demo and paper on arXiv.
Panel Reviews
“The language-driven interface — describe what to remove, no manual masking — makes this immediately integrable into a post-production pipeline without specialist operators. Apache 2.0 means commercial use is clean. The optional optical flow correction step handles shape distortion that kills most inpainting tools. This is directly deployable for b-roll cleanup, brand removal, and localization workflows.”
“CogVideoX-Fun-5B is a 5-billion parameter diffusion model — inference is slow and expensive without serious GPU capacity. The 25-person user study is not a production benchmark. Most of the 'physics rewriting' shown in demos involves simple shadows and reflections; complex rigid-body interactions like toppling stacks of objects are conspicuously absent from examples. Impressive research paper, but the gap between paper results and reliable production use is wide.”
“Hollywood-grade object removal has been a $50k/day VFX pipeline for twenty years. VOID is the first credible signal it becomes a commodity API call. When this runs in real-time on next-generation consumer GPUs, it eliminates entire categories of reshoots and location-dependent production constraints. Netflix open-sourcing it means every competitor will build on it — the floor for professional video tools just moved up dramatically.”
“I spent three hours testing VOID on location footage with unwanted signage and a boom mic visible in two shots. Both cleaned up correctly including the shadow the mic cast on a white wall. The Hugging Face demo is slow but the results are genuinely usable without post-processing. This alone would have cost me a day of manual roto work. Absolute ship for anyone doing content production.”
Community Sentiment
“Netflix open-sourcing their production video AI is not what I expected this week”
“The physics-aware part is the key — most tools just inpaint, this one rewrites causality”
“VOID on HuggingFace is already breaking — everyone is trying it”