#websockets

LLM X/Twitter Apr 30, 2026 2 min read

Responses API WebSockets cut agent loop latency by up to 40%

Why it matters: faster models stop feeling fast if orchestration overhead eats the gain. OpenAI says WebSocket mode made agent workflows up to 40% faster end to end, while lifting effective inference speed from about 65 to nearly 1,000 tokens per second.

#openai #responses-api #websockets

LLM Apr 23, 2026 2 min read

Responses API WebSockets make OpenAI agent loops up to 40% faster

The bottleneck moved from GPUs to the API layer, and OpenAI changed the transport to keep up. By adding WebSocket mode and connection-scoped caching to the Responses API, the company says agentic workflows improved by up to 40% end-to-end and GPT-5.3-Codex-Spark reached 1,000 tokens per second with bursts up to 4,000.

#openai #responses-api #websockets