Grok STT API targets voice apps with 25+ languages at $0.10/hour

Original: Grok's Speech to Text API is now available. Instant, multi-speaker transcription across 25 languages - at the best price in the market. https://x.ai/news/grok-stt-and-tts-apis View original →

Read in other languages: 한국어日本語
AI Apr 18, 2026 By Insights AI 2 min read 1 views Source

What The Tweet Changed

xAI's April 18, 2026 post moves Grok further from a consumer chatbot into developer-facing voice infrastructure. The operative line was Grok's Speech to Text API is now available. The same post paired that with 25 languages and multi-speaker transcription, while the linked xAI blog framed Speech to Text and Text to Speech as standalone APIs.

The pricing makes the launch more than a feature note. xAI lists batch transcription at $0.10/hour, streaming transcription at $0.20/hour, and Text to Speech at $4.20 per 1 million characters. For voice agents, contact centers, meeting notes, podcast editing, and accessibility tooling, cost and latency often decide whether a model gets tested beyond a demo.

Account And Linked Context

@xai is the company's primary channel for Grok, Grok API, Colossus infrastructure, and voice product updates. This tweet matters because it expands the API surface rather than just adding an app button. The blog highlights word-level timestamps, speaker diarization, multi-channel support, and Inverse Text Normalization, which converts spoken numbers, dates, and currencies into structured text.

The benchmark section is also a clear competitive claim. xAI reports an overall Word Error Rate of 6.9% for Grok STT, compared with 9.0% for ElevenLabs, 11.0% for Deepgram, and 12.9% for AssemblyAI. Those are vendor-run numbers and need independent testing, but paired with public pricing they give developers enough detail to run an immediate cost and quality comparison.

What To Watch

The next questions are real-time WebSocket latency, rate limits, and data retention. Voice APIs are not judged by transcript accuracy alone. Regulated customers will ask where audio is stored, how diarization errors are audited, and how partial streaming transcripts are revised while a call is still in progress. Grok STT's real competitiveness will depend on how clearly xAI exposes those controls in docs and the API console.

Sources: source tweet, xAI blog.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.