Skip to content
Decaying

LiveKit adds xAI TTS to Inference with 20+ languages and no separate API key

Original: LiveKit adds xAI text-to-speech to LiveKit Inference View original →

Read in other languages: 한국어日本語
AI Mar 20, 2026 By Insights AI 1 min read 56 views Source

LiveKit said on X on March 16, 2026 that xAI’s Grok text-to-speech is now available inside LiveKit Inference. The post describes the integration as a low-latency, production-ready path for voice agents, highlighting multilingual support, telephony readiness, and simpler access for developers building real-time voice systems.

The linked LiveKit documentation fills in the implementation details. It says xAI TTS is available through LiveKit Agents via both LiveKit Inference and a direct xAI plugin. For the managed path, developers can use the model xai/tts-1 without provisioning a separate xAI API key, which lowers the setup overhead for teams already running their agents on LiveKit’s stack.

LiveKit also says the model supports more than 20 languages, including English, Japanese, Korean, Chinese, Hindi, Portuguese, Spanish, Turkish, and Vietnamese. The docs show that developers can select a voice directly in an AgentSession and optionally pass language settings and other parameters through the inference TTS class. That makes the integration more than a generic wrapper. It is being presented as a first-class component inside the broader LiveKit agent framework.

For teams that want direct control, LiveKit also documents a separate plugin path that uses XAI_API_KEY and the livekit-agents[xai] package. That split is strategically important. It gives developers a choice between convenience through LiveKit Inference and direct vendor integration when they need their own authentication, billing, or custom deployment setup.

The significance is broader than one TTS connector. Voice agents are becoming more multimodal, more international, and more tightly integrated with phone systems and real-time application flows. By adding xAI TTS into LiveKit Inference, LiveKit is making it easier for developers to plug another frontier-model vendor into that stack without rebuilding their audio pipeline from scratch.

Share: Long

Related Articles

AI X/Twitter Mar 23, 2026 2 min read

LiveKit said on March 19, 2026 that it trained an audio model that can distinguish real user interruptions from backchannels and other noise. The company’s blog says the feature is now generally available in LiveKit Agents, delivers 86% precision and 100% recall at 500 ms overlap speech, and is enabled by default in current Python and TypeScript agent SDKs.

AI X/Twitter Apr 5, 2026 2 min read

Mistral AI said on March 26, 2026 that Voxtral TTS offers expressive speech, support for 9 languages and dialects, low latency, and easy adaptation to new voices. Mistral’s March 23 launch post says the 4B-parameter model can adapt from about three seconds of reference audio, reaches roughly 70ms model latency, supports up to two minutes of native audio generation, and is available by API and as open weights.