Google DeepMind unveils Gemini 3.1 Flash-Lite as low-cost, high-speed model

Launch signal from X and official blog

On March 3, 2026 (UTC), Google DeepMind posted on X that Gemini 3.1 Flash-Lite has landed, describing it as the most cost-efficient model in the Gemini 3 series. At collection time, the post recorded roughly 7,804 likes, 267 replies, and 1,233,045 views. The announcement aligns with Google’s detailed release note, Gemini 3.1 Flash-Lite: Built for intelligence at scale.

Published economics and performance claims

Google says Flash-Lite is priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens. The same post cites Artificial Analysis benchmarks claiming 2.5X faster Time to First Answer Token and a 45% increase in output speed versus Gemini 2.5 Flash while maintaining similar or better quality. Google also references benchmark figures including Elo 1432 on Arena.ai, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro.

Deployment implications for builders

The rollout is described as preview availability via the Gemini API in Google AI Studio and enterprise access through Vertex AI. Google positions the model for high-volume tasks such as translation, moderation, and responsive app experiences where latency and cost are both hard constraints. In practice, this launch is a direct play for production workloads that need predictable cost profiles without giving up multimodal and reasoning coverage.

Sources: Google DeepMind X post, Google blog post

LLM sources.twitter Mar 6, 2026 1 min read

Google DeepMind launches Gemini 3.1 Flash-Lite in preview

Google DeepMind announced Gemini 3.1 Flash-Lite on X on March 3, 2026. According to Google’s official post, the model is launching in preview with low per-token pricing and a speed-focused profile for high-volume developer workloads.

#google-deepmind #gemini #flash-lite

LLM 3d ago 2 min read

DeepMind's Decoupled DiLoCo keeps frontier training alive through failures

Training a frontier model across far-flung data centers usually means paying a brutal synchronization tax. DeepMind says Decoupled DiLoCo cuts cross-site bandwidth from 198 Gbps to 0.84 Gbps in its eight-datacenter setup while holding benchmark ML accuracy near baseline at 64.1%.

#google-deepmind #diloco #llm-training

LLM 5d ago 2 min read

Google turns Deep Research into an MCP-native agent for finance and life sciences

Google has put Deep Research on Gemini 3.1 Pro, added MCP connections, and created a Max mode that searches more sources for harder research jobs. The April 21 preview targets finance and life sciences teams that need web evidence, uploaded files and licensed data in one workflow.

#google #gemini #mcp