Show HN: Building a Sub-500ms Latency Voice Agent from Scratch

400ms Voice AI: What It Takes

Developer Nick Tikhonov shared a Show HN project (122 upvotes) detailing how he built a voice agent averaging ~400ms end-to-end latency — from phone stop to first syllable — with a complete STT → LLM → TTS pipeline, clean barge-ins, and no precomputed responses.

What Actually Moved the Needle

Semantic End-of-Turn Detection: VAD alone fails for natural conversation. You need semantic understanding of when someone is truly done speaking
Streaming is Non-Negotiable: Sequential pipelines are dead on arrival. STT → LLM → TTS must all stream
TTFT Dominates: Groq's ~80ms time-to-first-token was the single biggest performance win
Geography Over Prompts: Colocating all components mattered more than any prompt optimization

The Core Loop

The system reduces to two states — speaking vs. listening — and two critical transitions: cancel instantly on barge-in, respond instantly on end-of-turn. These transitions define the entire user experience. Voice is fundamentally a turn-taking problem, not a transcription problem.

Open Source

The project is available on GitHub as 'shuo'. For developers building real-time voice AI systems, this implementation offers a practical, battle-tested reference for achieving sub-500ms conversational latency.

AI sources.Axios 6d ago 2 min read

Kimi’s rise puts Chinese open-weight models back in Washington’s sights

The policy fight is no longer just about model benchmarks. Axios reports that U.S. officials have revisited tools such as Entity List threats, security advisories, procurement pressure, and hosting liability rules as cheaper Chinese open-weight models gain enterprise traction.

#ai-policy #open-weight #china

AI X/Twitter 6d ago 1 min read

Databricks ties Genie One, ZeroOps, LTAP and Unity AI Gateway into one agent stack

Databricks’ Summit recap compresses a broad enterprise AI roadmap into five minutes. The product list includes Genie One, Ontology, App Builder, ZeroOps, LTAP, Unity AI Gateway, Omnigent and CustomerLake.

#databricks #ai-agents #data-platform

AI X/Twitter 6d ago 1 min read

Baidu Unlimited-OCR reads 40-page documents with only 500M active parameters

Long-document OCR is bottlenecked by page chunking and growing KV cache. A widely shared post says Baidu’s Unlimited-OCR uses 3B total parameters, 500M active parameters, and a 32K context window to read 40-page documents in one pass.

#baidu #ocr #document-ai