Google DeepMind brings Gemini Embedding 2 to preview for multimodal retrieval
Original: Google DeepMind launches Gemini Embedding 2 in preview View original →
Google DeepMind said on X on March 10, 2026 that Gemini Embedding 2 is now available in preview through the Gemini API and Vertex AI. The company describes it as the first fully multimodal embedding model built on the Gemini architecture, designed to map text, images, video, audio, and documents into a shared vector space.
That description is more significant than it may sound. Many production retrieval systems still rely on separate models for text search, image search, document indexing, and media understanding. A genuinely multimodal embedding layer can simplify the stack by letting teams store and compare different content types in one representation. That matters for enterprise search, recommendation systems, multimodal RAG, and any workflow where users mix screenshots, PDFs, voice notes, or clips with text queries.
Google says the model supports more than 100 languages and can process mixed inputs rather than only one modality at a time. In the launch materials, the company also highlights support for up to 8,192 text tokens, up to 6 images per request, short video and audio inputs, and PDF documents. It also exposes multiple output sizes, including 3,072, 1,536, and 768 dimensions, using Matryoshka Representation Learning so teams can trade retrieval quality against storage and serving cost.
The competitive context is also notable. Embeddings rarely get the same attention as flagship chat models, but they quietly determine how much real-world information a system can retrieve and rank before any generation step begins. By pushing a fully multimodal embedding model into preview, Google DeepMind is moving the Gemini family deeper into the infrastructure layer that powers search, knowledge systems, and agent memory.
For developers, the practical takeaway is straightforward: if Gemini Embedding 2 performs well in production, it could reduce the number of specialized vector pipelines they need to maintain. That can lower system complexity, make multimodal retrieval more natural, and give Google a stronger position in the part of the AI stack that sits underneath assistants, copilots, and enterprise knowledge tools.
Related Articles
Google has put Gemini Embedding 2 into public preview through the Gemini API and Vertex AI. The model is Google’s first natively multimodal embedding system, combining text, image, video, audio, and document inputs in one embedding space.
Google AI Developers says Gemini Embedding 2 is now in preview via the Gemini API and Vertex AI. Google describes it as its first fully multimodal embedding model on the Gemini architecture and its most capable embedding model so far.
Google on March 3, 2026 introduced Gemini 3.1 Flash-Lite as the fastest and most cost-efficient model in the Gemini 3 family. The preview is rolling out through Google AI Studio and Vertex AI at $0.25/1M input tokens and $1.50/1M output tokens.
Comments (0)
No comments yet. Be the first to comment!