Google DeepMind unveils Gemini 3.1 Flash-Lite as low-cost, high-speed model
Original: Gemini 3.1 Flash-Lite has landed as the most cost-efficient Gemini 3 model View original →
Launch signal from X and official blog
On March 3, 2026 (UTC), Google DeepMind posted on X that Gemini 3.1 Flash-Lite has landed, describing it as the most cost-efficient model in the Gemini 3 series. At collection time, the post recorded roughly 7,804 likes, 267 replies, and 1,233,045 views. The announcement aligns with Google’s detailed release note, Gemini 3.1 Flash-Lite: Built for intelligence at scale.
Published economics and performance claims
Google says Flash-Lite is priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens. The same post cites Artificial Analysis benchmarks claiming 2.5X faster Time to First Answer Token and a 45% increase in output speed versus Gemini 2.5 Flash while maintaining similar or better quality. Google also references benchmark figures including Elo 1432 on Arena.ai, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro.
Deployment implications for builders
The rollout is described as preview availability via the Gemini API in Google AI Studio and enterprise access through Vertex AI. Google positions the model for high-volume tasks such as translation, moderation, and responsive app experiences where latency and cost are both hard constraints. In practice, this launch is a direct play for production workloads that need predictable cost profiles without giving up multimodal and reasoning coverage.
Sources: Google DeepMind X post, Google blog post
Related Articles
Google DeepMind announced Gemini 3.1 Flash-Lite on X on March 3, 2026. According to Google’s official post, the model is launching in preview with low per-token pricing and a speed-focused profile for high-volume developer workloads.
Google DeepMind said on March 3, 2026 that Gemini 3.1 Flash-Lite delivers faster performance at a lower price than Gemini 2.5 Flash. Google is rolling the model out in preview via Google AI Studio and Vertex AI for high-volume, latency-sensitive workloads.
Google AI shared practical Gemini 3.1 Flash-Lite examples, including high-volume image sorting and business automation scenarios. The thread also points developers to preview access via Gemini API, Google AI Studio, and Vertex AI.
Comments (0)
No comments yet. Be the first to comment!