Google DeepMind Launches Gemini Omni: Generate Video from Any Input
Original: Google DeepMind Launches Gemini Omni: Video Generation from Any Input View original →
What Is Gemini Omni
Google DeepMind unveiled Gemini Omni at Google I/O 2026 on May 19 — the first model in the Omni family capable of generating video from virtually any input: text, images, audio, or existing video. The model combines Gemini's world knowledge with Google's generative media infrastructure, representing what the company describes as "a leap forward in world understanding, multimodality, and editing."
What It Can Do
Gemini Omni Flash generates short video clips grounded in real-world knowledge. Users can upload a photo and generate multiple video variations, or apply follow-up prompts to transform framing and style. The model features improved understanding of physical forces such as gravity, kinetic energy, and fluid dynamics, enabling more realistic simulations. Flash clips are capped at 10 seconds. All generated videos carry Google's imperceptible SynthID digital watermark for content provenance.
Availability
Gemini Omni Flash is available immediately to Google AI Plus, Pro, and Ultra subscribers through the Gemini app and Google Flow. Users 18 and older can access it for free through YouTube Shorts Remix and the YouTube Create app. A developer API is expected within the coming weeks.
Context: Google I/O 2026
The Omni launch was among the headlines at Google I/O 2026, alongside Gemini 3.5 Flash and a personal AI agent called Gemini Spark. CEO Sundar Pichai opened the keynote: "We are firmly in our agentic Gemini era." By releasing Omni Flash first, Google signals a strategy of staying at the frontier while deploying AI at scale across billions of users, intensifying the multimodal race with OpenAI and others.
Related Articles
Google revealed Gemini Omni at I/O 2026—a "world model" that processes text, audio, images, and video together to simulate physical environments. Unlike Sora or Runway, it lets users edit footage through natural language and maintains scene consistency across modifications. It replaces Veo in the Gemini app immediately.
Why it matters: retrieval stacks are being pulled from text-only search into multimodal memory. Google AI Studio said Gemini Embedding 2 is generally available and covers text, image, video, audio, and documents through one model path.
Google has updated the Gemini API File Search tool to support multimodal content including images, audio, and video, making it easier for developers to build efficient, verifiable RAG systems.
Comments (0)
No comments yet. Be the first to comment!