Google DeepMind unveils Gemini 3.1 Flash-Lite as low-cost, high-speed model
Original: Gemini 3.1 Flash-Lite has landed as the most cost-efficient Gemini 3 model View original →
Launch signal from X and official blog
On March 3, 2026 (UTC), Google DeepMind posted on X that Gemini 3.1 Flash-Lite has landed, describing it as the most cost-efficient model in the Gemini 3 series. At collection time, the post recorded roughly 7,804 likes, 267 replies, and 1,233,045 views. The announcement aligns with Google’s detailed release note, Gemini 3.1 Flash-Lite: Built for intelligence at scale.
Published economics and performance claims
Google says Flash-Lite is priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens. The same post cites Artificial Analysis benchmarks claiming 2.5X faster Time to First Answer Token and a 45% increase in output speed versus Gemini 2.5 Flash while maintaining similar or better quality. Google also references benchmark figures including Elo 1432 on Arena.ai, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro.
Deployment implications for builders
The rollout is described as preview availability via the Gemini API in Google AI Studio and enterprise access through Vertex AI. Google positions the model for high-volume tasks such as translation, moderation, and responsive app experiences where latency and cost are both hard constraints. In practice, this launch is a direct play for production workloads that need predictable cost profiles without giving up multimodal and reasoning coverage.
Sources: Google DeepMind X post, Google blog post
Related Articles
Google DeepMind announced Gemini 3.1 Flash-Lite on X on March 3, 2026. According to Google’s official post, the model is launching in preview with low per-token pricing and a speed-focused profile for high-volume developer workloads.
Training a frontier model across far-flung data centers usually means paying a brutal synchronization tax. DeepMind says Decoupled DiLoCo cuts cross-site bandwidth from 198 Gbps to 0.84 Gbps in its eight-datacenter setup while holding benchmark ML accuracy near baseline at 64.1%.
Google has put Deep Research on Gemini 3.1 Pro, added MCP connections, and created a Max mode that searches more sources for harder research jobs. The April 21 preview targets finance and life sciences teams that need web evidence, uploaded files and licensed data in one workflow.
Comments (0)
No comments yet. Be the first to comment!