LiveKit adds xAI TTS to Inference with 20+ languages and no separate API key
Original: LiveKit adds xAI text-to-speech to LiveKit Inference View original →
LiveKit said on X on March 16, 2026 that xAI’s Grok text-to-speech is now available inside LiveKit Inference. The post describes the integration as a low-latency, production-ready path for voice agents, highlighting multilingual support, telephony readiness, and simpler access for developers building real-time voice systems.
The linked LiveKit documentation fills in the implementation details. It says xAI TTS is available through LiveKit Agents via both LiveKit Inference and a direct xAI plugin. For the managed path, developers can use the model xai/tts-1 without provisioning a separate xAI API key, which lowers the setup overhead for teams already running their agents on LiveKit’s stack.
LiveKit also says the model supports more than 20 languages, including English, Japanese, Korean, Chinese, Hindi, Portuguese, Spanish, Turkish, and Vietnamese. The docs show that developers can select a voice directly in an AgentSession and optionally pass language settings and other parameters through the inference TTS class. That makes the integration more than a generic wrapper. It is being presented as a first-class component inside the broader LiveKit agent framework.
For teams that want direct control, LiveKit also documents a separate plugin path that uses XAI_API_KEY and the livekit-agents[xai] package. That split is strategically important. It gives developers a choice between convenience through LiveKit Inference and direct vendor integration when they need their own authentication, billing, or custom deployment setup.
The significance is broader than one TTS connector. Voice agents are becoming more multimodal, more international, and more tightly integrated with phone systems and real-time application flows. By adding xAI TTS into LiveKit Inference, LiveKit is making it easier for developers to plug another frontier-model vendor into that stack without rebuilding their audio pipeline from scratch.
Related Articles
A high-upvote LocalLLaMA thread highlighted KittenTTS v0.8, with community-shared details on 80M/40M/14M model variants, Apache-2.0 licensing, and an edge-friendly focus on local CPU inference.
Together AI said on March 12, 2026 that it is launching a one-cloud stack for real-time voice agents. Its public materials describe co-located STT, LLM, and TTS infrastructure with under-500ms latency, 25+ regions, and separate kernel work that cut time-to-first-64-tokens to 77ms in a voice-agent deployment.
A March 9, 2026 LocalLLaMA discussion highlighted Fish Audio’s S2 release, which combines fine-grained inline speech control, multilingual coverage, and an SGLang-based streaming stack.
Comments (0)
No comments yet. Be the first to comment!