Google DeepMind launches Gemini 3.1 Flash-Lite in preview
Original: Gemini 3.1 Flash-Lite has landed. It’s our most cost-efficient Gemini 3 series model yet, built for intelligence at scale. Here’s what’s new 🧵 View original →
Launch signal from X and Google’s official write-up
Google DeepMind announced Gemini 3.1 Flash-Lite on X on March 3, 2026 (UTC), calling it the most cost-efficient model in the Gemini 3 series. The launch thread links to Google’s detailed product post. Source X post: nitter.net/GoogleDeepMind/status/2028872381477929185. Product details: blog.google/.../gemini-3-1-flash-lite.
Public numbers and positioning
Google says Flash-Lite is rolling out in preview via the Gemini API in Google AI Studio and via Vertex AI for enterprise users. Listed pricing is $0.25 per 1M input tokens and $1.50 per 1M output tokens. The company cites Artificial Analysis and claims better price-performance than Gemini 2.5 Flash, including a 2.5x faster time to first answer token and a 45% output speed increase.
Google also published benchmark snapshots: Elo 1432 on Arena.ai, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro. The message is that Flash-Lite is not only cheaper and faster, but still competitive on reasoning and multimodal understanding for its tier.
Developer workflow implications
A notable product control is “thinking levels,” available in AI Studio and Vertex AI, allowing teams to tune reasoning depth per workload. Google highlights use cases ranging from high-volume translation and content moderation to UI/dashboard generation and simulation workflows. Early-access references include Latitude, Cartwheel, and Whering.
- Cost profile: optimized for high-throughput workloads with tight unit economics
- Latency profile: geared toward responsive, real-time product experiences
- Control profile: reasoning depth can be adjusted instead of fixed
Overall, Flash-Lite appears designed for production-scale deployment where response speed and per-request cost are as important as raw model capability.
Related Articles
Google DeepMind said on March 3, 2026 that Gemini 3.1 Flash-Lite delivers faster performance at a lower price than Gemini 2.5 Flash. Google is rolling the model out in preview via Google AI Studio and Vertex AI for high-volume, latency-sensitive workloads.
Google AI shared practical Gemini 3.1 Flash-Lite examples, including high-volume image sorting and business automation scenarios. The thread also points developers to preview access via Gemini API, Google AI Studio, and Vertex AI.
Google DeepMind announced Gemini 3.1 Flash-Lite on X on March 3, 2026 (UTC), calling it the most cost-efficient Gemini 3 model. Google’s companion blog post published pricing, latency claims, benchmark references, and preview availability in AI Studio and Vertex AI.
Comments (0)
No comments yet. Be the first to comment!