Meituan’s LongCat team released an audio-driven avatar video model with Diffusers examples and an MIT license on Hugging Face. The project compares against HeyGen, Kling Avatar 2.0, and OmniHuman-1.5.
#video-generation
RSS FeedAt Google I/O 2026, Google DeepMind unveiled Gemini Omni — its first model capable of generating video from any input including text, images, audio, and video. Combining Gemini's intelligence with Google's generative media systems, it is available now through the Gemini app and YouTube Shorts.
Google revealed Gemini Omni at I/O 2026—a "world model" that processes text, audio, images, and video together to simulate physical environments. Unlike Sora or Runway, it lets users edit footage through natural language and maintains scene consistency across modifications. It replaces Veo in the Gemini app immediately.
NVIDIA Labs released SANA-WM, a 2.6B parameter open-source world model capable of generating up to one minute of 720p video. The relatively small model size and open-source availability make it a significant contribution to accessible video generation research.
A video believed to be from Google's unreleased 'Omni' video generation model has leaked, drawing 1,300+ upvotes on r/singularity. Users particularly noted the model's unusually coherent text rendering - a persistent weakness in current video generation models.
Google introduced Veo 3.1 Lite as its most cost-effective video generation model, priced at less than 50% of Veo 3.1 Fast while keeping the same speed. The model is rolling out through the paid tier of the Gemini API and Google AI Studio, broadening access to higher-volume video app use cases.
Netflix’s VOID reached Reddit as an open research release aimed at removing objects from video and repairing the interactions those objects caused in the scene. The notable details are the CogVideoX base, a two-pass pipeline, Gemini+SAM2 mask generation, and a 40GB+ VRAM requirement.
Together AI said on April 3, 2026 that Wan 2.7 from Alibaba Cloud is now available on its platform. The accompanying product post says text-to-video is live now, with image-to-video, reference-to-video, and video edit workflows rolling out on the same API, auth, and billing surface.
OpenAI said on March 23, 2026 that Sora videos include visible and invisible provenance signals, including C2PA metadata, alongside consent controls and tighter rules for videos involving real people. The company also described teen-specific protections, content filters across video and audio, and blocks on music that imitates living artists or existing works.
The Financial Times reports that DeepSeek V4 is set to launch next week, featuring image and video generation capabilities that position it as a direct competitor to multimodal AI models from OpenAI and Google.
ByteDance's Seedance 2.0 has arrived seemingly out of nowhere, generating hyperrealistic AI videos that have Hollywood insiders deeply concerned. The model creates footage indistinguishable from real camera recordings using a single text prompt.
ByteDance's Seedance 2.0 accepts text, images, video clips, and audio simultaneously to generate up to 20-second 1080p video, drawing immediate copyright cease-and-desist letters from Disney and Paramount within days of launch.