Gemini 3.1 Flash TTS adds audio tags and 70+ languages

Gemini 3.1 Flash TTS matters because speech models are no longer judged only by whether they sound clean. The harder question is control: can a developer ask a voice to slow down, shift tone, switch speakers, or keep a character consistent without building a separate production stack? In an April 15 post, Google said the new model brings audio tags to text-to-speech, letting instructions inside the input steer vocal style, pace, and delivery.

The rollout is broader than a research demo. Google says 3.1 Flash TTS is available in preview for developers through the Gemini API and Google AI Studio, in preview for enterprises on Vertex AI, and for Workspace users through Google Vids. That puts the same model across prototyping, enterprise deployment, and video creation workflows, which is exactly where voice agents and localized media production are beginning to overlap.

The hard numbers are the hook. Gemini 3.1 Flash TTS supports 70+ languages and posted an Elo score of 1,211 on the Artificial Analysis TTS leaderboard, which is built from blind human preferences. Google also points to native multi-speaker dialogue, Audio Profiles, Director's Notes, and inline tags as tools for directing speech output. In practical terms, the model is trying to turn a prompt into something closer to a voice performance brief.

The safety detail is not a footnote. Google says all audio generated by Gemini 3.1 Flash TTS is watermarked with SynthID, its imperceptible marker for identifying AI-generated media. The next test is whether those controls stay reliable outside polished demos: noisy scripts, long-form narration, multiple speakers, and languages with less commercial training data. Source: Google Keyword.

For developers, the important boundary is consistency. A short demo voice is easy to impress with; a product voice has to keep the same persona across retries, speaker changes, and localization passes. By putting Audio Profiles and inline instructions near the prompt, Google is trying to make that control inspectable instead of hiding it in a separate studio layer.

Gemini 3.1 Flash TTS adds audio tags and 70+ languages

Related Articles

Hacker News Debates a Repo Claiming to Reverse Gemini's SynthID

Google starts Japan early access for Gemini for Home

Google brings Live translate with headphones to iOS and expands it to more countries

Comments (0)

Leave a Comment

Related Articles

Hacker News Debates a Repo Claiming to Reverse Gemini's SynthID
AI Hacker News Apr 10, 2026 2 min read

Google starts Japan early access for Gemini for Home
AI Apr 12, 2026 2 min read

Google brings Live translate with headphones to iOS and expands it to more countries