Google DeepMind launches Gemini 3.1 Flash-Lite in preview
Original: Gemini 3.1 Flash-Lite has landed. It’s our most cost-efficient Gemini 3 series model yet, built for intelligence at scale. Here’s what’s new 🧵 View original →
Launch signal from X and Google’s official write-up
Google DeepMind announced Gemini 3.1 Flash-Lite on X on March 3, 2026 (UTC), calling it the most cost-efficient model in the Gemini 3 series. The launch thread links to Google’s detailed product post. Source X post: nitter.net/GoogleDeepMind/status/2028872381477929185. Product details: blog.google/.../gemini-3-1-flash-lite.
Public numbers and positioning
Google says Flash-Lite is rolling out in preview via the Gemini API in Google AI Studio and via Vertex AI for enterprise users. Listed pricing is $0.25 per 1M input tokens and $1.50 per 1M output tokens. The company cites Artificial Analysis and claims better price-performance than Gemini 2.5 Flash, including a 2.5x faster time to first answer token and a 45% output speed increase.
Google also published benchmark snapshots: Elo 1432 on Arena.ai, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro. The message is that Flash-Lite is not only cheaper and faster, but still competitive on reasoning and multimodal understanding for its tier.
Developer workflow implications
A notable product control is “thinking levels,” available in AI Studio and Vertex AI, allowing teams to tune reasoning depth per workload. Google highlights use cases ranging from high-volume translation and content moderation to UI/dashboard generation and simulation workflows. Early-access references include Latitude, Cartwheel, and Whering.
- Cost profile: optimized for high-throughput workloads with tight unit economics
- Latency profile: geared toward responsive, real-time product experiences
- Control profile: reasoning depth can be adjusted instead of fixed
Overall, Flash-Lite appears designed for production-scale deployment where response speed and per-request cost are as important as raw model capability.
Related Articles
Google launched Gemini 3.5 Flash at I/O 2026 on May 19, making it generally available the same day. It outperforms Gemini 3.1 Pro on coding and agentic benchmarks while running 4x faster at 40% lower cost.
Google DeepMind has released Gemini 3.1 Pro with over 2x reasoning performance versus Gemini 3 Pro. The model scores 77.1% on ARC-AGI-2 (up from 31.1%), 80.6% on SWE-bench Verified, and tops 12 of 18 tracked benchmarks at unchanged $2/$12 per million token pricing.
Google DeepMind announced Gemini 3.1 Pro on February 19, 2026 as an upgraded core model for harder tasks. The company highlighted a verified 77.1% score on ARC-AGI-2 and broad rollout across developer, enterprise, and consumer surfaces.