Google opens Gemini Embedding 2 preview as its first natively multimodal embedding model

Google AI Developers announced on X on March 10, 2026 that Gemini Embedding 2 is now available in preview via the Gemini API and Vertex AI. The company described it as its most capable embedding model and its first fully multimodal embedding system built on the Gemini architecture. In a follow-up thread and its official blog post, Google said the model maps text, images, video, audio, and documents into a single unified embedding space, which means developers can search, classify, and cluster different media types without stitching together separate embedding stacks.

That design choice addresses a practical problem in modern retrieval systems. Many enterprise and consumer applications no longer work with text alone. Product manuals include diagrams, support records include screenshots, research datasets include PDFs and video clips, and voice systems generate audio alongside transcripts. Google says Gemini Embedding 2 can represent these formats natively while preserving semantic relationships across more than 100 languages. The company is positioning the model as infrastructure for multimodal RAG, semantic search, recommendation, and analytics pipelines.

Text inputs support up to 8192 tokens.
The model can process up to 6 images per request, video clips up to 120 seconds, native audio, and PDFs up to 6 pages.
Output dimensions are flexible, with a default of 3072 and smaller options that let teams balance quality against storage and serving cost.

Google also says Gemini Embedding 2 uses Matryoshka Representation Learning, which lets developers shrink embedding dimensionality without training separate models for each footprint target. That matters because embeddings are usually deployed at high volume, where vector database size, network bandwidth, and retrieval latency directly affect cost. A single multimodal model with flexible dimensionality can simplify architecture while still giving teams room to optimize production performance.

The strategic significance is not just that Google launched another embedding model. The more important shift is that multimodal retrieval is moving closer to a default assumption rather than a specialized add-on. If a single API call can place text, images, audio, video, and short documents in one semantic space, developers can spend less time on preprocessing glue and more time on ranking, policy, and application behavior. Gemini Embedding 2’s public preview is therefore as much an infrastructure announcement as a model announcement.

Google opens Gemini Embedding 2 preview as its first natively multimodal embedding model

Related Articles

Google launches Gemini Embedding 2 for unified text, image, audio, video, and document search

Gemini 3.5 Flash reaches GA as Google turns Search into an agent surface

Google DeepMind brings Gemini Embedding 2 to preview for multimodal retrieval

Related Articles

Google launches Gemini Embedding 2 for unified text, image, audio, video, and document search
LLM X/Twitter Mar 22, 2026 2 min read

Gemini 3.5 Flash reaches GA as Google turns Search into an agent surface
LLM May 29, 2026 2 min read

Google DeepMind brings Gemini Embedding 2 to preview for multimodal retrieval
LLM X/Twitter Mar 17, 2026 2 min read