xAI opens its Text-to-Speech API with streaming, speech tags, and five voices
Original: Grok's Text to Speech API is now available. Start building with natural voices and expressive controls to bring your apps to life. http://x.ai/api/voice#text-to-speech View original →
What xAI announced on X
On March 16, 2026, xAI said Grok's Text to Speech API is now available and pitched it as a way to build apps with natural voices and more expressive controls. The X post was brief, but it marked a clear expansion of xAI's public API surface beyond text and reasoning into deployable audio generation.
That matters because text-to-speech is not just a demo feature. Once an API reaches production use, it becomes infrastructure for voice assistants, narration, accessibility layers, call flows, and multimodal applications that need audio output with predictable latency and format control.
What the official voice docs specify
xAI's official voice documentation describes the Text to Speech API as a beta service at POST https://api.x.ai/v1/tts. The docs say the endpoint accepts up to 4,096 characters of text, supports inline speech tags for expressive delivery, and returns output in formats ranging from standard web audio to telephony-oriented codecs.
- xAI's docs list five voices:
eve,ara,leo,rex, andsal. - Supported output options include
mp3,wav,pcm,mulaw, andalaw, covering browser playback, raw pipelines, and call-center style telephony use cases. - For real-time use, xAI also documents a streaming WebSocket endpoint at
wss://api.x.ai/v1/tts, where audio is returned incrementally as base64-encoded chunks.
The broader voice overview page places this TTS surface alongside xAI's interactive Voice Agent API, which suggests xAI is building a layered voice stack: one endpoint for direct speech generation and another for full conversational agents.
Why this matters
For developers, the important point is control. A usable voice API needs more than a single synthetic voice and a downloadable file. It needs low-latency streaming, format choices that match deployment environments, and expressive controls for emphasis, pacing, and tone. xAI is explicitly trying to cover those requirements from the start.
Strategically, this moves xAI closer to the broader race for multimodal developer platforms. If Grok is going to appear in customer support, media generation, enterprise workflows, or agentic products, voice output has to be first-class infrastructure. The release does not settle questions about long-term pricing or production reliability, but it does show that xAI wants its API to compete on more than text alone.
Sources: xAI X post · xAI Text to Speech docs · xAI Voice overview
Related Articles
Elon Musk's xAI signed an agreement with the Pentagon allowing Grok to be deployed in classified military systems, accepting the 'all lawful purposes' condition that Anthropic refused.
Elon Musk's xAI has secured a Pentagon agreement to deploy Grok in classified military systems, displacing Anthropic's previous exclusivity. Anthropic now faces an ultimatum to remove safety restrictions or risk being labeled a supply chain risk.
Elon Musk has released the Grok 4.2 public beta, featuring four specialized AI agents (Grok, Harper, Benjamin, Lucas) working in parallel. The rapid learning architecture improves the model weekly and reduces hallucinations by 65%.
Comments (0)
No comments yet. Be the first to comment!