#tts

AI Hacker News Apr 28, 2026 2 min read

HN Pushes Back on Microsoft’s “Open-Source Frontier Voice AI” Framing

Hacker News did not treat VibeVoice as a straightforward launch post. The thread quickly turned into an audit of what was actually open, what had been pulled before, and whether the models are compelling enough to matter against existing voice stacks.

#microsoft #voice-ai #asr

LLM Reddit Apr 24, 2026 2 min read

LocalLLaMA Hears a Breakthrough in Qwen3 TTS: Real-Time, Local, and Finally Expressive

LocalLLaMA was not impressed by another TTS clip so much as by a build log. The post that took off showed Qwen3-TTS running locally in real time, quantized through llama.cpp, with extra alignment work to make subtitles and lip sync behave.

#qwen #tts #llama.cpp

AI Apr 16, 2026 2 min read

Gemini 3.1 Flash TTS adds audio tags and 70+ languages

Google’s new speech model moves control from hidden settings into the text itself: audio tags can steer style, pace, and delivery across 70+ languages. Gemini 3.1 Flash TTS is in preview through Gemini API, Google AI Studio, and Vertex AI, reaches Google Vids users, scores 1,211 Elo on Artificial Analysis, and watermarks outputs with SynthID.

#gemini #tts #speech-ai

AI X/Twitter Apr 5, 2026 2 min read

Mistral launches Voxtral TTS as a low-latency multilingual speech layer for voice agents

Mistral AI said on March 26, 2026 that Voxtral TTS offers expressive speech, support for 9 languages and dialects, low latency, and easy adaptation to new voices. Mistral’s March 23 launch post says the 4B-parameter model can adapt from about three seconds of reference audio, reaches roughly 70ms model latency, supports up to two minutes of native audio generation, and is available by API and as open weights.

#mistral #tts #voice-agents

LLM Reddit Mar 29, 2026 3 min read

LocalLLaMA Highlights a Community Attempt to Restore Voice Cloning to Mistral’s Voxtral TTS

A March 2026 r/LocalLLaMA post with 123 points and 25 comments spotlighted `voxtral-voice-clone`, a project trying to train the missing codec encoder for Mistral’s Voxtral-4B-TTS-2603. The repo targets zero-shot cloning via `ref_audio`, which the original open-weight release could not support because the encoder weights were not included.

#tts #voice-cloning #mistral

AI Reddit Mar 27, 2026 2 min read

Mistral's Voxtral TTS puts open-weight speech generation back at the center of the local AI stack

A high-signal LocalLLaMA thread formed around Voxtral TTS because Mistral paired low latency, multilingual support, and open weights in a part of the stack many teams still keep closed.

#mistral #tts #speech

AI X/Twitter Mar 20, 2026 1 min read

LiveKit adds xAI TTS to Inference with 20+ languages and no separate API key

LiveKit said on X that xAI’s Grok text-to-speech is now available in LiveKit Inference with low-latency streaming, telephony readiness, and support for more than 20 languages. LiveKit’s docs say developers can access `xai/tts-1` through LiveKit Inference without a separate xAI API key or use the xAI plugin directly with `XAI_API_KEY`.

#livekit #xai #tts

AI Hacker News Mar 20, 2026 2 min read

Hacker News Spots a Tiny CPU-First TTS Release: Kitten TTS v0.8

Kitten TTS v0.8 drew Hacker News attention by promising ONNX-based speech synthesis in 15M to 80M models that can run locally on CPUs, while commenters stress-tested real-world usability.

#tts #onnx #edge-ai

AI Reddit Mar 15, 2026 2 min read

Fish Audio S2 Brings Inline Emotion Control and Fast Streaming to Open TTS

A March 9, 2026 LocalLLaMA discussion highlighted Fish Audio’s S2 release, which combines fine-grained inline speech control, multilingual coverage, and an SGLang-based streaming stack.

#tts #speech #audio

103

AI Reddit Mar 9, 2026 2 min read

r/LocalLLaMA: VoiceShelf Runs Kokoro TTS Offline on Android for EPUB Audiobooks

A well-received r/LocalLLaMA post described an Android app that turns EPUB books into spoken audio entirely on-device using Kokoro TTS. The project highlights how mobile inference speed, APK size, and thermal behavior now shape practical offline AI products.

#on-device-ai #tts #android

102

LLM Reddit Feb 23, 2026 1 min read

Qwen3's Hidden Gem: Voice Embeddings Enable Mathematical Voice Manipulation

Qwen3's TTS model encodes voices into 1024-dimensional vectors, enabling gender swapping, pitch adjustment, voice mixing, and semantic voice search through vector math — now available as a standalone lightweight encoder on HuggingFace.

#qwen3 #tts #voice-embeddings

LLM Reddit Feb 20, 2026 2 min read

LocalLLaMA spotlights Kitten TTS v0.8 for compact on-device speech

A widely discussed LocalLLaMA post introduces open Kitten TTS v0.8 models (80M/40M/14M), emphasizing CPU-friendly deployment and sub-25MB footprint for the smallest variant.

#tts #localllama #edge-ai