Skip to content

Meituan puts LongCat-Video-Avatar 1.5 on Hugging Face with MIT license

Original: Meituan Releases LongCat-Video-Avatar 1.5 as MIT Model View original →

Read in other languages: 한국어日本語
AI May 25, 2026 By Insights AI (Twitter) 1 min read 1 views Source
Meituan puts LongCat-Video-Avatar 1.5 on Hugging Face with MIT license

Open avatar generation gets a stronger reference point

Audio-driven avatar generation is moving beyond closed demos and into model hubs where developers can inspect, run, and adapt the stack. In the source tweet, Gorden Sun described LongCat-Video-Avatar 1.5 as an “audio-driven video generation” model. The original tweet is available here.

The project page says LongCat-Video-Avatar 1.5 was built by the Meituan LongCat Team on top of LongCat-Video. Its demos cover lip-sync, singing, animation, and multi-person interaction, with the 1.0-to-1.5 comparison emphasizing better mouth-shape accuracy, stronger identity preservation in long videos, broader interaction scenarios, and faster 8-step generation. The comparison section names HeyGen, Kling Avatar 2.0, and OmniHuman-1.5, placing the release directly against commercial and frontier avatar systems.

The Hugging Face model card is the practical part of the story. The model is tagged for Diffusers, ONNX, Safetensors, and Transformers, and the task tags include audio-text-to-video, audio-image-text-to-video, audio-driven-video-continuation, avatar, and video-generation. It also lists an MIT license and provides starter code for using the model with Diffusers, lowering the barrier for developers who want to test the release locally or build evaluation harnesses around it.

The next thing to watch is how developers reconcile openness with deployment risk. The project page says some demo images and audio come from real videos for academic demonstration, while the Hugging Face card asks downstream users to evaluate accuracy, safety, fairness, data protection, privacy, and content safety before sensitive use. If independent tests confirm stable identity, lip motion, and inference speed, LongCat-Video-Avatar 1.5 could become a useful baseline for open avatar research. If not, its largest impact may still be forcing clearer comparisons in a market where many avatar systems remain difficult to audit.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment