#latency

LLM X/Twitter Apr 30, 2026 2 min read

Responses API WebSockets cut agent loop latency by up to 40%

Why it matters: faster models stop feeling fast if orchestration overhead eats the gain. OpenAI says WebSocket mode made agent workflows up to 40% faster end to end, while lifting effective inference speed from about 65 to nearly 1,000 tokens per second.

#openai #responses-api #websockets

LLM X/Twitter Mar 5, 2026 1 min read

OpenAIDevs Announces /fast Mode: GPT-5.4 in Codex Runs 1.5x Faster

OpenAIDevs said Codex now supports a /fast mode where GPT-5.4 runs 1.5x faster while keeping the same intelligence and reasoning profile. The update targets faster coding iteration and debugging loops for developer workflows.

#codex #gpt-5-4 #developer-tools

116

AI Hacker News Mar 3, 2026 1 min read

Show HN: Building a Sub-500ms Latency Voice Agent from Scratch

Developer Nick Tikhonov shares how he built a voice AI agent achieving ~400ms end-to-end latency with a full STT → LLM → TTS pipeline, including clean barge-ins and no precomputed responses.

#voice-agent #ai #llm

LLM Hacker News Feb 16, 2026 1 min read

Two Paths to Faster LLM Inference: Batch Strategy vs Specialized Compute

A widely discussed Hacker News post compares Anthropic and OpenAI fast modes and argues that LLM speed gains are increasingly driven by serving architecture, not just model quality.

#llm #inference #latency