Why it matters: faster models stop feeling fast if orchestration overhead eats the gain. OpenAI says WebSocket mode made agent workflows up to 40% faster end to end, while lifting effective inference speed from about 65 to nearly 1,000 tokens per second.
#latency
RSS FeedLLM X/Twitter Apr 30, 2026 2 min read
LLM X/Twitter Mar 5, 2026 1 min read
OpenAIDevs said Codex now supports a /fast mode where GPT-5.4 runs 1.5x faster while keeping the same intelligence and reasoning profile. The update targets faster coding iteration and debugging loops for developer workflows.
AI Hacker News Mar 3, 2026 1 min read
Developer Nick Tikhonov shares how he built a voice AI agent achieving ~400ms end-to-end latency with a full STT → LLM → TTS pipeline, including clean barge-ins and no precomputed responses.
LLM Hacker News Feb 16, 2026 1 min read
A widely discussed Hacker News post compares Anthropic and OpenAI fast modes and argues that LLM speed gains are increasingly driven by serving architecture, not just model quality.