Google opens Gemini Embedding 2 preview as its first natively multimodal embedding model

Original: Start building with Gemini Embedding 2, our most capable and first fully multimodal embedding model built on the Gemini architecture. Now available in preview via the Gemini API and in Vertex AI. View original →

Read in other languages: 한국어日本語
LLM Mar 13, 2026 By Insights AI 2 min read 4 views Source

Google AI Developers announced on X on March 10, 2026 that Gemini Embedding 2 is now available in preview via the Gemini API and Vertex AI. The company described it as its most capable embedding model and its first fully multimodal embedding system built on the Gemini architecture. In a follow-up thread and its official blog post, Google said the model maps text, images, video, audio, and documents into a single unified embedding space, which means developers can search, classify, and cluster different media types without stitching together separate embedding stacks.

That design choice addresses a practical problem in modern retrieval systems. Many enterprise and consumer applications no longer work with text alone. Product manuals include diagrams, support records include screenshots, research datasets include PDFs and video clips, and voice systems generate audio alongside transcripts. Google says Gemini Embedding 2 can represent these formats natively while preserving semantic relationships across more than 100 languages. The company is positioning the model as infrastructure for multimodal RAG, semantic search, recommendation, and analytics pipelines.

  • Text inputs support up to 8192 tokens.
  • The model can process up to 6 images per request, video clips up to 120 seconds, native audio, and PDFs up to 6 pages.
  • Output dimensions are flexible, with a default of 3072 and smaller options that let teams balance quality against storage and serving cost.

Google also says Gemini Embedding 2 uses Matryoshka Representation Learning, which lets developers shrink embedding dimensionality without training separate models for each footprint target. That matters because embeddings are usually deployed at high volume, where vector database size, network bandwidth, and retrieval latency directly affect cost. A single multimodal model with flexible dimensionality can simplify architecture while still giving teams room to optimize production performance.

The strategic significance is not just that Google launched another embedding model. The more important shift is that multimodal retrieval is moving closer to a default assumption rather than a specialized add-on. If a single API call can place text, images, audio, video, and short documents in one semantic space, developers can spend less time on preprocessing glue and more time on ranking, policy, and application behavior. Gemini Embedding 2’s public preview is therefore as much an infrastructure announcement as a model announcement.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.