Google opens Gemini Embedding 2 preview as its first natively multimodal embedding model
Original: Start building with Gemini Embedding 2, our most capable and first fully multimodal embedding model built on the Gemini architecture. Now available in preview via the Gemini API and in Vertex AI. View original →
Google AI Developers announced on X on March 10, 2026 that Gemini Embedding 2 is now available in preview via the Gemini API and Vertex AI. The company described it as its most capable embedding model and its first fully multimodal embedding system built on the Gemini architecture. In a follow-up thread and its official blog post, Google said the model maps text, images, video, audio, and documents into a single unified embedding space, which means developers can search, classify, and cluster different media types without stitching together separate embedding stacks.
That design choice addresses a practical problem in modern retrieval systems. Many enterprise and consumer applications no longer work with text alone. Product manuals include diagrams, support records include screenshots, research datasets include PDFs and video clips, and voice systems generate audio alongside transcripts. Google says Gemini Embedding 2 can represent these formats natively while preserving semantic relationships across more than 100 languages. The company is positioning the model as infrastructure for multimodal RAG, semantic search, recommendation, and analytics pipelines.
- Text inputs support up to 8192 tokens.
- The model can process up to 6 images per request, video clips up to 120 seconds, native audio, and PDFs up to 6 pages.
- Output dimensions are flexible, with a default of 3072 and smaller options that let teams balance quality against storage and serving cost.
Google also says Gemini Embedding 2 uses Matryoshka Representation Learning, which lets developers shrink embedding dimensionality without training separate models for each footprint target. That matters because embeddings are usually deployed at high volume, where vector database size, network bandwidth, and retrieval latency directly affect cost. A single multimodal model with flexible dimensionality can simplify architecture while still giving teams room to optimize production performance.
The strategic significance is not just that Google launched another embedding model. The more important shift is that multimodal retrieval is moving closer to a default assumption rather than a specialized add-on. If a single API call can place text, images, audio, video, and short documents in one semantic space, developers can spend less time on preprocessing glue and more time on ranking, policy, and application behavior. Gemini Embedding 2’s public preview is therefore as much an infrastructure announcement as a model announcement.
Related Articles
Google AI Developers says Gemini Embedding 2 is now in preview via the Gemini API and Vertex AI. Google describes it as its first fully multimodal embedding model on the Gemini architecture and its most capable embedding model so far.
Google on March 3, 2026 introduced Gemini 3.1 Flash-Lite as the fastest and most cost-efficient model in the Gemini 3 family. The preview is rolling out through Google AI Studio and Vertex AI at $0.25/1M input tokens and $1.50/1M output tokens.
Google put Gemini Embedding 2 into public preview on March 10, 2026. The company says the model handles text, images, and mixed multimodal documents in one embedding space while improving benchmark scores to 68.32 for text and 53.3 for image tasks without changing price or vector dimensions.
Comments (0)
No comments yet. Be the first to comment!