Skip to content

OpenAI Releases Three New Realtime Voice Models Including GPT-5-Class Reasoning

Read in other languages: 한국어日本語
LLM May 13, 2026 By Insights AI 1 min read 1 views Source

Three New Models

OpenAI on May 7 released three realtime audio models designed for developers building a new class of voice apps. Each targets a distinct use case:

  • GPT-Realtime-2: OpenAI's first realtime voice model with GPT-5-class reasoning. Built for live voice interactions where the model handles complex requests, calls tools, and manages interruptions while keeping conversation flowing naturally. Scores 15.2% higher on Big Bench Audio than GPT-Realtime-1.5.
  • GPT-Realtime-Translate: Live translation model that converts speech from 70+ input languages to 13 output languages at the speaker's pace.
  • GPT-Realtime-Whisper: Streaming speech-to-text that transcribes as the speaker talks, rather than batch processing after they stop.

Pricing

GPT-Realtime-2 (high quality) is priced at $32/M audio input tokens ($0.40 for cached), $64/M audio output tokens. GPT-Realtime-Translate runs $0.034/minute; GPT-Realtime-Whisper $0.017/minute.

Significance

Bringing GPT-5-class reasoning to a realtime voice model is a meaningful step toward voice agents that can handle genuinely complex, multi-step tasks rather than simple commands. Full details in the OpenAI announcement.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment