Netflix Releases First Public AI Model for Advanced Video Object Removal on Hugging Face
Key Takeaways
- ▸Netflix released its first public AI model on Hugging Face, specializing in video object removal with physics-aware cleanup of shadows, reflections, and collision effects
- ▸The model is based on Alibaba's CogVideoX-Fun-V1.5-5b and supports up to 197 frames at 384×672 resolution with optional two-pass inference for improved quality
- ▸Positioned as research-oriented rather than production-ready, the model targets VFX and post-production workflows to reduce manual editing labor
Summary
Netflix has released its first public AI model, available on Hugging Face, marking the streaming giant's entry into open-source AI development. The model, called Void, specializes in removing selected objects from video footage while also eliminating the physical effects those objects created—such as shadows, reflections, knocked-over items, and collision artifacts. This capability sets it apart from standard inpainting tools by restoring scenes to appear as if the removed objects were never present.
The model is built on a fine-tuned version of Alibaba's CogVideoX-Fun-V1.5-5b, a 5 billion parameter video diffusion model, and accepts video input along with text prompts and quadmasks to specify regions for removal or preservation. It supports up to 197 frames at 384×672 resolution and includes a two-pass inference option to reduce artifacts and improve temporal consistency. Netflix emphasizes the model is research-oriented rather than production-ready, requiring 40GB+ VRAM (A100 or equivalent) and local setup, with code and weights available on GitHub and Hugging Face respectively.
The primary use case is VFX and post-production work, where the model could significantly reduce manual frame-by-frame cleanup labor. Netflix's decision to open-source this capability reflects the broader industry trend of major tech companies sharing AI models and tools with the developer community.
- Code, model weights, and inference scripts are publicly available on GitHub and Hugging Face, requiring local setup and significant VRAM (40GB+)
Editorial Opinion
Netflix's entry into open-source AI models demonstrates how even consumer-facing entertainment companies are leveraging AI to solve real production challenges. By releasing a physics-aware video inpainting model, Netflix is sharing genuinely useful tooling with the broader VFX and creative community, not just incremental improvements. This move signals that sophisticated video AI capabilities are becoming commoditized—expect other studios and production companies to follow suit with their own specialized models.



