Google DeepMind launches Gemini 3.1 Flash-Lite in preview

Original: Gemini 3.1 Flash-Lite has landed. It’s our most cost-efficient Gemini 3 series model yet, built for intelligence at scale. Here’s what’s new 🧵 View original →

Read in other languages: 한국어日本語
LLM Mar 6, 2026 By Insights AI 1 min read 4 views Source

Launch signal from X and Google’s official write-up

Google DeepMind announced Gemini 3.1 Flash-Lite on X on March 3, 2026 (UTC), calling it the most cost-efficient model in the Gemini 3 series. The launch thread links to Google’s detailed product post. Source X post: nitter.net/GoogleDeepMind/status/2028872381477929185. Product details: blog.google/.../gemini-3-1-flash-lite.

Public numbers and positioning

Google says Flash-Lite is rolling out in preview via the Gemini API in Google AI Studio and via Vertex AI for enterprise users. Listed pricing is $0.25 per 1M input tokens and $1.50 per 1M output tokens. The company cites Artificial Analysis and claims better price-performance than Gemini 2.5 Flash, including a 2.5x faster time to first answer token and a 45% output speed increase.

Google also published benchmark snapshots: Elo 1432 on Arena.ai, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro. The message is that Flash-Lite is not only cheaper and faster, but still competitive on reasoning and multimodal understanding for its tier.

Developer workflow implications

A notable product control is “thinking levels,” available in AI Studio and Vertex AI, allowing teams to tune reasoning depth per workload. Google highlights use cases ranging from high-volume translation and content moderation to UI/dashboard generation and simulation workflows. Early-access references include Latitude, Cartwheel, and Whering.

  • Cost profile: optimized for high-throughput workloads with tight unit economics
  • Latency profile: geared toward responsive, real-time product experiences
  • Control profile: reasoning depth can be adjusted instead of fixed

Overall, Flash-Lite appears designed for production-scale deployment where response speed and per-request cost are as important as raw model capability.

Share:

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.