LiveKit makes adaptive interruption handling generally available for voice agents

Original: How can a voice agent tell when you’re actually interrupting it? VAD is too sensitive—laughs, “mm-hmm,” or a sneeze shouldn’t stop the agent. We trained an audio model for adaptive interruption handling so agents can distinguish real interruptions from noise. View original →

Read in other languages: 한국어日本語
AI Mar 23, 2026 By Insights AI 2 min read Source

What LiveKit posted on X

On March 19, 2026, LiveKit framed a common voice-agent failure in simple terms: VAD is too sensitive. Laughs, backchannels like “mm-hmm,” sneezes, and other incidental sounds should not cause an agent to stop speaking as if the user has fully barged in. LiveKit said it trained an audio model for adaptive interruption handling so agents can distinguish real interruptions from noise.

That sounds like a narrow UX fix, but it addresses one of the hardest real-time problems in conversational AI: turn taking. Voice agents feel unnatural very quickly when they either talk over the user or stop too eagerly at every stray sound.

What the LiveKit blog adds

The linked LiveKit post says Adaptive Interruption Handling is now generally available in LiveKit Agents. Instead of relying on simple VAD alone, the new system runs an audio-based interruption model during the first few hundred milliseconds of detected user speech. According to LiveKit, the model looks at waveform shape, onset sharpness, signal duration, and prosodic features such as pitch and rhythm to decide whether the user is actually beginning a new utterance.

LiveKit says it trained the model on hundreds of hours of natural human-to-human conversations, then enriched the data with noise to reflect real-world conditions. The company also says the model is multilingual and generalizes to languages it did not explicitly see during training.

The benchmark section gives concrete performance numbers. LiveKit reports 86% precision and 100% recall at 500 ms overlap speech, rejects 51% of VAD-based false-positive barge-ins, detects true barge-ins faster than VAD in 64% of cases, and completes inference in 30 ms or less. Median audio needed to trigger interruption was 216 ms.

Operationally, the feature is enabled by default in Python Agents v1.5.0+ and TypeScript Agents v1.2.0+. LiveKit says every agent deployed on LiveKit Cloud gets it automatically at no extra cost, while self-hosted users receive 40,000 inference requests per month across plans.

Why this matters

Many voice-agent demos look impressive until people start talking naturally. Real conversations include partial acknowledgements, filled pauses, laughter, coughs, and background noise. Handling those cases well is what separates a system that merely speaks from one that can participate in a conversation.

If LiveKit's results translate into production voice apps, the improvement is bigger than a smoother demo. Better interruption handling reduces accidental turn breaks, improves perceived latency, and makes downstream agent logic easier to trust because fewer conversations are derailed by false stops.

Sources: LiveKit X post · LiveKit blog

Share: Long

Related Articles

AI sources.twitter 3d ago 1 min read

LiveKit said on X that xAI’s Grok text-to-speech is now available in LiveKit Inference with low-latency streaming, telephony readiness, and support for more than 20 languages. LiveKit’s docs say developers can access `xai/tts-1` through LiveKit Inference without a separate xAI API key or use the xAI plugin directly with `XAI_API_KEY`.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.