Google DeepMind launches Gemini 3.1 Flash-Lite in preview

Launch signal from X and Google’s official write-up

Google DeepMind announced Gemini 3.1 Flash-Lite on X on March 3, 2026 (UTC), calling it the most cost-efficient model in the Gemini 3 series. The launch thread links to Google’s detailed product post. Source X post: nitter.net/GoogleDeepMind/status/2028872381477929185. Product details: blog.google/.../gemini-3-1-flash-lite.

Public numbers and positioning

Google says Flash-Lite is rolling out in preview via the Gemini API in Google AI Studio and via Vertex AI for enterprise users. Listed pricing is $0.25 per 1M input tokens and $1.50 per 1M output tokens. The company cites Artificial Analysis and claims better price-performance than Gemini 2.5 Flash, including a 2.5x faster time to first answer token and a 45% output speed increase.

Google also published benchmark snapshots: Elo 1432 on Arena.ai, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro. The message is that Flash-Lite is not only cheaper and faster, but still competitive on reasoning and multimodal understanding for its tier.

Developer workflow implications

A notable product control is “thinking levels,” available in AI Studio and Vertex AI, allowing teams to tune reasoning depth per workload. Google highlights use cases ranging from high-volume translation and content moderation to UI/dashboard generation and simulation workflows. Early-access references include Latitude, Cartwheel, and Whering.

Cost profile: optimized for high-throughput workloads with tight unit economics
Latency profile: geared toward responsive, real-time product experiences
Control profile: reasoning depth can be adjusted instead of fixed

Overall, Flash-Lite appears designed for production-scale deployment where response speed and per-request cost are as important as raw model capability.

LLM sources.twitter Mar 4, 2026 1 min read

Google DeepMind unveils Gemini 3.1 Flash-Lite as low-cost, high-speed model

Google DeepMind announced Gemini 3.1 Flash-Lite on X on March 3, 2026 (UTC), calling it the most cost-efficient Gemini 3 model. Google’s companion blog post published pricing, latency claims, benchmark references, and preview availability in AI Studio and Vertex AI.

#google-deepmind #gemini #flash-lite

LLM 3d ago 2 min read

DeepMind's Decoupled DiLoCo keeps frontier training alive through failures

Training a frontier model across far-flung data centers usually means paying a brutal synchronization tax. DeepMind says Decoupled DiLoCo cuts cross-site bandwidth from 198 Gbps to 0.84 Gbps in its eight-datacenter setup while holding benchmark ML accuracy near baseline at 64.1%.

#google-deepmind #diloco #llm-training

LLM 5d ago 2 min read

Google turns Deep Research into an MCP-native agent for finance and life sciences

Google has put Deep Research on Gemini 3.1 Pro, added MCP connections, and created a Max mode that searches more sources for harder research jobs. The April 21 preview targets finance and life sciences teams that need web evidence, uploaded files and licensed data in one workflow.

#google #gemini #mcp