Why it matters: xAI has turned the Grok Voice stack into standalone STT/TTS APIs with batch transcription at $0.10/hour and streaming at $0.20/hour. The post puts 25+ languages, diarization, and word-level timestamps in direct competition with enterprise transcription tools.
#voice-ai
RSS FeedGoogle DeepMind said on March 26, 2026 that Gemini 3.1 Flash Live is rolling out in Gemini Live and Google Search Live, while developers can access it through Google AI Studio. Google’s announcement positions 3.1 Flash Live as its highest-quality audio model, with lower latency, improved tonal understanding, and benchmark gains including 90.8% on ComplexFuncBench Audio.
Google introduced Gemini 3.1 Flash Live on Mar 26, 2026 as its new real-time audio model for developers, enterprises, and consumer products. The release ties together the Gemini Live API, Gemini Enterprise for Customer Experience, Search Live, and Gemini Live around a single lower-latency voice stack.
A Launch HN thread pulled RunAnywhere’s MetalRT and RCLI into focus, centering attention on a low-latency STT-LLM-TTS stack that runs on Apple Silicon without cloud APIs.
A Launch HN thread pushed RunAnywhere's RCLI into view as an Apple Silicon-first macOS voice AI stack that combines STT, LLM, TTS, local RAG, and 38 system actions without relying on cloud APIs.
IBM and Deepgram said on Feb 24, 2026 that they are integrating Deepgram speech-to-text and text-to-speech into watsonx Orchestrate. Deepgram becomes IBM's first voice partner as IBM pushes voice AI deeper into enterprise agent workflows.