Google puts Gemini Embedding 2 into public preview as its first natively multimodal embedding model
Original: Gemini Embedding 2 View original →
What Google Announced
Google announced Gemini Embedding 2 in public preview on March 10, 2026 and described it as the company’s first natively multimodal embedding model. Instead of limiting embeddings to text, Google says the model can represent text, images, and mixed multimodal documents such as PDFs that combine writing, figures, and charts in a shared vector space.
That matters because many production retrieval systems still split text and image indexing into separate pipelines. Teams often have to maintain multiple embedding models or add translation layers between text and vision retrieval. Google’s pitch for Gemini Embedding 2 is that a single model can simplify that stack and make multimodal search, recommendation, and RAG systems easier to build and operate.
What Google Claims Improved
Google says Gemini Embedding 2 lifts text benchmark performance from 62.3 to 68.32 and reaches 53.3 on image benchmarks, while preserving the same price and vector dimensions as the earlier Gemini Embedding offering. From an adoption standpoint, that is one of the most important details in the announcement. Better quality without changing vector size or cost means teams can upgrade retrieval quality without redesigning storage layouts or blowing up serving economics.
The multimodal document angle is also important. Real enterprise knowledge bases are full of slide decks, PDFs, product sheets, scanned forms, and reports that mix text with diagrams and screenshots. A model that embeds those artifacts more directly can improve recall and ranking in systems where plain text search has been a weak point.
Why It Matters
Gemini Embedding 2 is not a headline chatbot launch, but it is a meaningful infrastructure release. In many AI products, retrieval quality is the hidden bottleneck behind generation quality. By treating multimodal embeddings as a core production feature rather than a research extra, Google is signaling that multimodal RAG and search are moving into the standard application stack.
Source: Google
Related Articles
Google has put Gemini Embedding 2 into public preview through the Gemini API and Vertex AI. The model is Google’s first natively multimodal embedding system, combining text, image, video, audio, and document inputs in one embedding space.
Google AI Developers says Gemini Embedding 2 is now in preview via the Gemini API and Vertex AI. Google describes it as its first fully multimodal embedding model on the Gemini architecture and its most capable embedding model so far.
Google on March 3, 2026 introduced Gemini 3.1 Flash-Lite as the fastest and most cost-efficient model in the Gemini 3 family. The preview is rolling out through Google AI Studio and Vertex AI at $0.25/1M input tokens and $1.50/1M output tokens.
Comments (0)
No comments yet. Be the first to comment!