The bottleneck moved from GPUs to the API layer, and OpenAI changed the transport to keep up. By adding WebSocket mode and connection-scoped caching to the Responses API, the company says agentic workflows improved by up to 40% end-to-end and GPT-5.3-Codex-Spark reached 1,000 tokens per second with bursts up to 4,000.
#responses-api
RSS FeedOpenAI on March 11, 2026 detailed how it combines the Responses API with a shell tool and hosted containers to give agents a managed computer environment. The company says the design is meant to make file handling, tool execution, network access, and long-running workflows easier to run in production.
OpenAI Developers said on March 21, 2026 that container startup for skills, hosted shell, and code interpreter was about 10x faster via a new container pool in the Responses API. Updated OpenAI shell docs show hosted shell can create containers automatically, reuse active containers by reference, and keep them alive for 20 minutes of inactivity.
OpenAI Developers published a March 11, 2026 engineering write-up explaining how the Responses API uses a hosted computer environment for long-running agent workflows. The post centers on shell execution, hosted containers, controlled network access, reusable skills, and native compaction for context management.
OpenAIDevs posted on 2026-02-24 that GPT-5.3-Codex is now available for all developers in the Responses API. The announcement moves API access from a staged rollout to general developer availability.