Grok STT API targets voice apps with 25+ languages at $0.10/hour
Original: Grok's Speech to Text API is now available. Instant, multi-speaker transcription across 25 languages - at the best price in the market. https://x.ai/news/grok-stt-and-tts-apis View original →
What The Tweet Changed
xAI's April 18, 2026 post moves Grok further from a consumer chatbot into developer-facing voice infrastructure. The operative line was Grok's Speech to Text API is now available. The same post paired that with 25 languages and multi-speaker transcription, while the linked xAI blog framed Speech to Text and Text to Speech as standalone APIs.
The pricing makes the launch more than a feature note. xAI lists batch transcription at $0.10/hour, streaming transcription at $0.20/hour, and Text to Speech at $4.20 per 1 million characters. For voice agents, contact centers, meeting notes, podcast editing, and accessibility tooling, cost and latency often decide whether a model gets tested beyond a demo.
Account And Linked Context
@xai is the company's primary channel for Grok, Grok API, Colossus infrastructure, and voice product updates. This tweet matters because it expands the API surface rather than just adding an app button. The blog highlights word-level timestamps, speaker diarization, multi-channel support, and Inverse Text Normalization, which converts spoken numbers, dates, and currencies into structured text.
The benchmark section is also a clear competitive claim. xAI reports an overall Word Error Rate of 6.9% for Grok STT, compared with 9.0% for ElevenLabs, 11.0% for Deepgram, and 12.9% for AssemblyAI. Those are vendor-run numbers and need independent testing, but paired with public pricing they give developers enough detail to run an immediate cost and quality comparison.
What To Watch
The next questions are real-time WebSocket latency, rate limits, and data retention. Voice APIs are not judged by transcript accuracy alone. Regulated customers will ask where audio is stored, how diarization errors are audited, and how partial streaming transcripts are revised while a call is still in progress. Grok STT's real competitiveness will depend on how clearly xAI exposes those controls in docs and the API console.
Sources: source tweet, xAI blog.
Related Articles
xAI has launched Grok Voice Think Fast 1.0, a voice agent optimized for enterprise customer support scenarios, emphasizing low-latency responses and natural conversation flow.
xAI is pushing Grok from chat into app and automation building. The beta combines Plan Mode, Imagine media generation, and a CLI for automations, and the launch post drew more than 53 million views.
OpenAI has launched GPT-Realtime-2 in its API, bringing GPT-5-class reasoning to real-time voice interactions. The release also includes GPT-Realtime-Translate for live multilingual speech translation and GPT-Realtime-Whisper for streaming transcription.
Comments (0)
No comments yet. Be the first to comment!