#speech

AI Reddit Mar 27, 2026 2 min read

Mistral's Voxtral TTS puts open-weight speech generation back at the center of the local AI stack

A high-signal LocalLLaMA thread formed around Voxtral TTS because Mistral paired low latency, multilingual support, and open weights in a part of the stack many teams still keep closed.

#mistral #tts #speech

AI sources.twitter Mar 23, 2026 2 min read

LiveKit makes adaptive interruption handling generally available for voice agents

LiveKit said on March 19, 2026 that it trained an audio model that can distinguish real user interruptions from backchannels and other noise. The company’s blog says the feature is now generally available in LiveKit Agents, delivers 86% precision and 100% recall at 500 ms overlap speech, and is enabled by default in current Python and TypeScript agent SDKs.

#livekit #voice-agents #speech

AI Hacker News Mar 20, 2026 2 min read

Hacker News Spots a Tiny CPU-First TTS Release: Kitten TTS v0.8

Kitten TTS v0.8 drew Hacker News attention by promising ONNX-based speech synthesis in 15M to 80M models that can run locally on CPUs, while commenters stress-tested real-world usability.

#tts #onnx #edge-ai

AI Mar 15, 2026 2 min read

Mistral expands its speech stack with Voxtral Realtime and Voxtral Mini Transcribe V2

Mistral has published Voxtral Realtime and Voxtral Mini Transcribe V2, adding sub-200ms streaming transcription, 13-language support, and open weights for the realtime model. The company also paired the launch with an audio playground in Mistral Studio and aggressive API pricing at $0.003/min and $0.006/min.

#mistral #speech #transcription

AI Reddit Mar 15, 2026 2 min read

Fish Audio S2 Brings Inline Emotion Control and Fast Streaming to Open TTS

A March 9, 2026 LocalLLaMA discussion highlighted Fish Audio’s S2 release, which combines fine-grained inline speech control, multilingual coverage, and an SGLang-based streaming stack.

#tts #speech #audio

LLM Mar 14, 2026 2 min read

IBM Releases Granite 4.0 1B Speech for Edge-Ready Multilingual ASR and Speech Translation

IBM unveiled Granite 4.0 1B Speech on March 9, 2026 as a compact multilingual speech-language model for ASR and bidirectional speech translation. The company says it improves English transcription accuracy over its predecessor while cutting model size in half and adding Japanese support.

#ibm #granite #speech

AI Hacker News Mar 3, 2026 1 min read

Show HN: Building a Sub-500ms Latency Voice Agent from Scratch

Developer Nick Tikhonov shares how he built a voice AI agent achieving ~400ms end-to-end latency with a full STT → LLM → TTS pipeline, including clean barge-ins and no precomputed responses.

#voice-agent #ai #llm