OpenAI and Perplexity share production lessons from scaling voice agents with the Realtime API
Original: 📣 Lessons from building voice agents at scale @perplexity_ai breaks down how running voice with the Realtime API in production shaped their approach to context, audio pipelines, and turn-taking in real-world environments. developers.openai.com/blog/r… View original →
What OpenAI and Perplexity described
OpenAI Developers said on March 30, 2026 that Perplexity has published a case study on building voice agents at scale with the Realtime API. In the official write-up, OpenAI and Perplexity say Perplexity uses Realtime-1.5 in production across products such as Perplexity Comet and Perplexity Computer, and that the company now manages millions of voice sessions every month. The article frames voice as a core interface, not a side feature, because users want to hand off work conversationally and watch an agent complete it.
The most valuable part of the post is that it focuses on operational lessons instead of launch marketing. Perplexity explains that the hard part was not just getting speech in and audio out. It was making long-running, tool-using voice agents stay stable when context grows, clients send different native audio buffers, and users speak in noisy environments with interruptions, hesitations, and mid-task corrections.
What changed in production
One concrete lesson involved context management. Perplexity says its early approach tried to push large transcript updates, but that failed in an all-or-nothing way. If a 10,000-token update arrived when the model had room for only 5,000 more tokens, the system could lose all prior history at once. The team changed course and began feeding context in 2,000-token chunks instead, accepting some overhead in exchange for more graceful truncation and more stable interactions.
The post also emphasizes that conversation semantics matter as much as raw token count. Perplexity found that if too much browsing context was inserted as user input, the model behaved as if the user had literally spoken every page fragment out loud. If too much was inserted as system, the model blurred the line between inherent knowledge, provided context, and the actual question. The team says getting these roles right was essential for making voice interactions feel natural.
A separate lesson came from audio infrastructure. Perplexity operates across clients built in Swift, TypeScript, Rust, and C++, and the company says inconsistent native audio buffers created uneven performance. Standardizing audio across product surfaces reduced that mismatch. The article also stresses the need to tune for messy real-world environments, where the model has to handle interruptions, background noise, and turn-taking without losing responsiveness.
Why it matters
This case study matters because it shows where production voice agents are actually fragile. The bottleneck is not only model quality. It is context management, message semantics, audio normalization, and interaction design under imperfect conditions. Those are the engineering details that turn a good voice demo into a voice system people can trust for regular use.
For developers building agent products, the broader takeaway is that voice is becoming infrastructure. Once teams operate at the scale Perplexity describes, design choices like chunk size, role labeling, and tool selection stop being implementation trivia and become product-defining architecture. That is a useful reality check for any company treating real-time multimodal agents as the next major interface layer.
Related Articles
OpenAI made ChatGPT Lockdown Mode available to all logged-in users and added moderation scores to API generation requests on June 4. The changes move prompt-injection and data-exfiltration defenses from policy language into product controls.
OpenAI launched GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper — new voice API models covering live reasoning, real-time translation across 70+ languages, and streaming transcription. The Realtime API is now generally available for production use.
OpenAI announced that Codex, its AI coding agent, is coming to the ChatGPT mobile app, enabling users to write, edit, and debug code directly from their smartphones.