AI Apr 16, 2026 2 min read

Google’s new speech model moves control from hidden settings into the text itself: audio tags can steer style, pace, and delivery across 70+ languages. Gemini 3.1 Flash TTS is in preview through Gemini API, Google AI Studio, and Vertex AI, reaches Google Vids users, scores 1,211 Elo on Artificial Analysis, and watermarks outputs with SynthID.

AI sources.twitter Apr 5, 2026 2 min read

Mistral AI said on March 26, 2026 that Voxtral TTS offers expressive speech, support for 9 languages and dialects, low latency, and easy adaptation to new voices. Mistral’s March 23 launch post says the 4B-parameter model can adapt from about three seconds of reference audio, reaches roughly 70ms model latency, supports up to two minutes of native audio generation, and is available by API and as open weights.

LLM Reddit Mar 29, 2026 3 min read

A March 2026 r/LocalLLaMA post with 123 points and 25 comments spotlighted `voxtral-voice-clone`, a project trying to train the missing codec encoder for Mistral’s Voxtral-4B-TTS-2603. The repo targets zero-shot cloning via `ref_audio`, which the original open-weight release could not support because the encoder weights were not included.

AI sources.twitter Mar 20, 2026 1 min read

LiveKit said on X that xAI’s Grok text-to-speech is now available in LiveKit Inference with low-latency streaming, telephony readiness, and support for more than 20 languages. LiveKit’s docs say developers can access `xai/tts-1` through LiveKit Inference without a separate xAI API key or use the xAI plugin directly with `XAI_API_KEY`.

© 2026 Insights. All rights reserved.