#audio

LLM Reddit Apr 13, 2026 2 min read

r/LocalLLaMA tracks the llama.cpp merge that brings in Qwen3 audio support

A 54-point Reddit post flagged merged PR #19441 as the moment qwen3-omni-moe and qwen3-asr support reached llama.cpp, with commenters focused on local multimodal and ASR use cases.

#qwen3 #llama-cpp #audio

AI X/Twitter Apr 5, 2026 2 min read

Mistral launches Voxtral TTS as a low-latency multilingual speech layer for voice agents

Mistral AI said on March 26, 2026 that Voxtral TTS offers expressive speech, support for 9 languages and dialects, low latency, and easy adaptation to new voices. Mistral’s March 23 launch post says the 4B-parameter model can adapt from about three seconds of reference audio, reaches roughly 70ms model latency, supports up to two minutes of native audio generation, and is available by API and as open weights.

#mistral #tts #voice-agents

LLM X/Twitter Apr 3, 2026 2 min read

Mistral outlines a speech-to-speech assistant stack built from Voxtral and Mistral Small 4

Mistral said on April 2, 2026 that developers can assemble a web-search-enabled speech-to-speech assistant in roughly 150 lines of code using Voxtral for transcription and speech generation plus Mistral Small 4 for agentic reasoning. The post is notable less as a single model launch than as a clear reference architecture for real-time audio agents.

#mistral #audio #speech-to-speech

AI X/Twitter Mar 16, 2026 2 min read

xAI opens its Text-to-Speech API with streaming, speech tags, and five voices

xAI said on March 16, 2026 that Grok's Text-to-Speech API is now available. xAI's own voice docs describe a beta API with five voices, inline speech tags, telephony-friendly codecs, and a streaming WebSocket mode for low-latency applications.

#xai #grok #text-to-speech

AI Mar 15, 2026 2 min read

Mistral expands its speech stack with Voxtral Realtime and Voxtral Mini Transcribe V2

Mistral has published Voxtral Realtime and Voxtral Mini Transcribe V2, adding sub-200ms streaming transcription, 13-language support, and open weights for the realtime model. The company also paired the launch with an audio playground in Mistral Studio and aggressive API pricing at $0.003/min and $0.006/min.

#mistral #speech #transcription

AI Reddit Mar 15, 2026 2 min read

Fish Audio S2 Brings Inline Emotion Control and Fast Streaming to Open TTS

A March 9, 2026 LocalLLaMA discussion highlighted Fish Audio’s S2 release, which combines fine-grained inline speech control, multilingual coverage, and an SGLang-based streaming stack.

#tts #speech #audio