OpenAI launched GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper — new voice API models covering live reasoning, real-time translation across 70+ languages, and streaming transcription. The Realtime API is now generally available for production use.
#api
RSS FeedAnthropic's Claude Platform is now generally available on AWS, offering full Claude API feature parity with AWS IAM authentication, CloudTrail audit logging, and a single AWS invoice that retires against existing commitments.
Google has updated the Gemini API File Search tool to support multimodal content including images, audio, and video, making it easier for developers to build efficient, verifiable RAG systems.
OpenAI has launched GPT-Realtime-2 in its API, bringing GPT-5-class reasoning to real-time voice interactions. The release also includes GPT-Realtime-Translate for live multilingual speech translation and GPT-Realtime-Whisper for streaming transcription.
xAI has released Grok 4.3 on its API, claiming top spots on agentic tool calling and instruction-following leaderboards, and ranking #1 in enterprise domains such as case law and corporate finance. It supports a 1M token context window at $1.25/M input and $2.50/M output.
A benchmark comparing vision agents (browser-use) to structured API agents on the same admin panel found vision agents cost roughly 45x more — and failed to complete the task without a 14-step explicit walkthrough.
xAI officially launched Voice Cloning through its API, allowing users to clone a custom voice in under 2 minutes or select from 80+ pre-built voices across 28 languages for voice agents, audiobooks, and game characters.
HN did not greet GPT-5.5 with applause first. The thread went straight to pricing, context tiers, and whether the model actually behaves better once real coding work starts.
Why it matters: API availability is the moment a flagship model becomes something teams can actually wire into products. OpenAI’s developer account says GPT-5.5 brings fewer retries, and the official release page now lists API access with a 1M context window and updated pricing.
xAI is turning voice agents into production software, not a demo. Grok Voice Think Fast 1.0 tops τ-voice Bench, supports 25+ languages, and xAI says the same stack is driving a 20% sales conversion and 70% support resolution flow at Starlink.
Sakana AI is trying to sell orchestration itself as a model product, not just a prompt hack around other APIs. In its beta table, fugu-ultra posts 54.2 on SWEPro and 95.1 on GPQAD while shipping behind an OpenAI-compatible API.
Why it matters: xAI has turned the Grok Voice stack into standalone STT/TTS APIs with batch transcription at $0.10/hour and streaming at $0.20/hour. The post puts 25+ languages, diarization, and word-level timestamps in direct competition with enterprise transcription tools.