OpenAI launched GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper — new voice API models covering live reasoning, real-time translation across 70+ languages, and streaming transcription. The Realtime API is now generally available for production use.
#voice-ai
RSS FeedxAI has launched Grok Voice Think Fast 1.0, a voice agent optimized for enterprise customer support scenarios, emphasizing low-latency responses and natural conversation flow.
OpenAI has added three new voice models with reasoning capabilities to the Realtime API, enabling developers to build low-latency voice applications powered by GPT-5-class intelligence.
OpenAI has launched GPT-Realtime-2 in its API, bringing GPT-5-class reasoning to real-time voice interactions. The release also includes GPT-Realtime-Translate for live multilingual speech translation and GPT-Realtime-Whisper for streaming transcription.
ElevenLabs disclosed $500M in ARR and $100M in net new ARR in Q1 2026 alone, as it added institutional backers including BlackRock, NVIDIA, and Deutsche Telekom to its $500M Series D originally announced in February.
Hacker News did not treat VibeVoice as a straightforward launch post. The thread quickly turned into an audit of what was actually open, what had been pulled before, and whether the models are compelling enough to matter against existing voice stacks.
Why it matters: xAI has turned the Grok Voice stack into standalone STT/TTS APIs with batch transcription at $0.10/hour and streaming at $0.20/hour. The post puts 25+ languages, diarization, and word-level timestamps in direct competition with enterprise transcription tools.
Google DeepMind said on March 26, 2026 that Gemini 3.1 Flash Live is rolling out in Gemini Live and Google Search Live, while developers can access it through Google AI Studio. Google’s announcement positions 3.1 Flash Live as its highest-quality audio model, with lower latency, improved tonal understanding, and benchmark gains including 90.8% on ComplexFuncBench Audio.
Google introduced Gemini 3.1 Flash Live on Mar 26, 2026 as its new real-time audio model for developers, enterprises, and consumer products. The release ties together the Gemini Live API, Gemini Enterprise for Customer Experience, Search Live, and Gemini Live around a single lower-latency voice stack.
A Launch HN thread pulled RunAnywhere’s MetalRT and RCLI into focus, centering attention on a low-latency STT-LLM-TTS stack that runs on Apple Silicon without cloud APIs.
A Launch HN thread pushed RunAnywhere's RCLI into view as an Apple Silicon-first macOS voice AI stack that combines STT, LLM, TTS, local RAG, and 38 system actions without relying on cloud APIs.
IBM and Deepgram said on Feb 24, 2026 that they are integrating Deepgram speech-to-text and text-to-speech into watsonx Orchestrate. Deepgram becomes IBM's first voice partner as IBM pushes voice AI deeper into enterprise agent workflows.