Google DeepMind launches Gemini 3.1 Flash-Lite in preview
Original: Gemini 3.1 Flash-Lite has landed. It’s our most cost-efficient Gemini 3 series model yet, built for intelligence at scale. Here’s what’s new 🧵 View original →
Launch signal from X and Google’s official write-up
Google DeepMind announced Gemini 3.1 Flash-Lite on X on March 3, 2026 (UTC), calling it the most cost-efficient model in the Gemini 3 series. The launch thread links to Google’s detailed product post. Source X post: nitter.net/GoogleDeepMind/status/2028872381477929185. Product details: blog.google/.../gemini-3-1-flash-lite.
Public numbers and positioning
Google says Flash-Lite is rolling out in preview via the Gemini API in Google AI Studio and via Vertex AI for enterprise users. Listed pricing is $0.25 per 1M input tokens and $1.50 per 1M output tokens. The company cites Artificial Analysis and claims better price-performance than Gemini 2.5 Flash, including a 2.5x faster time to first answer token and a 45% output speed increase.
Google also published benchmark snapshots: Elo 1432 on Arena.ai, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro. The message is that Flash-Lite is not only cheaper and faster, but still competitive on reasoning and multimodal understanding for its tier.
Developer workflow implications
A notable product control is “thinking levels,” available in AI Studio and Vertex AI, allowing teams to tune reasoning depth per workload. Google highlights use cases ranging from high-volume translation and content moderation to UI/dashboard generation and simulation workflows. Early-access references include Latitude, Cartwheel, and Whering.
- Cost profile: optimized for high-throughput workloads with tight unit economics
- Latency profile: geared toward responsive, real-time product experiences
- Control profile: reasoning depth can be adjusted instead of fixed
Overall, Flash-Lite appears designed for production-scale deployment where response speed and per-request cost are as important as raw model capability.
Related Articles
Google DeepMind announced Gemini 3.1 Flash-Lite on X on March 3, 2026 (UTC), calling it the most cost-efficient Gemini 3 model. Google’s companion blog post published pricing, latency claims, benchmark references, and preview availability in AI Studio and Vertex AI.
Training a frontier model across far-flung data centers usually means paying a brutal synchronization tax. DeepMind says Decoupled DiLoCo cuts cross-site bandwidth from 198 Gbps to 0.84 Gbps in its eight-datacenter setup while holding benchmark ML accuracy near baseline at 64.1%.
Google has put Deep Research on Gemini 3.1 Pro, added MCP connections, and created a Max mode that searches more sources for harder research jobs. The April 21 preview targets finance and life sciences teams that need web evidence, uploaded files and licensed data in one workflow.
Comments (0)
No comments yet. Be the first to comment!