Hacker News dissects a Claude Code quota dispute where prompt caching meets 1M-context agent workflows
Original: Pro Max 5x quota exhausted in 1.5 hours despite moderate usage View original →
A GitHub issue filed on April 9, 2026 spilled onto Hacker News and became a broader developer argument about what actually burns Claude Code Max quota in heavy agentic workflows. The reporter said a Pro Max 5x plan was exhausted only 1.5 hours after a quota reset, and backed the claim with usage data pulled from session logs instead of a vague complaint about pricing.
The issue compares two windows. In the first, five hours of heavy development produced 2,715 API calls, 1,044M cache-read tokens, and 1.15M output tokens. In the second, a supposedly moderate 1.5-hour window still consumed 691 calls and 103.9M cache-read tokens across the main session plus background sessions. From that, the author proposed a specific hypothesis: cache_read tokens may be counting at full rate against quota, even if caching reduces cost on paper.
The writeup also points at two amplifiers. One is shared quota usage from sessions left running in other terminals. The other is the cost shape created by a 1M context window, where auto-compacts can trigger very large requests right before the context resets. If caching does not materially reduce quota accounting, a tool-heavy coding agent can become quota-bound surprisingly fast, especially once it starts reading lots of files, spawning helpers, and carrying long-running context forward.
The Hacker News discussion treated this as more than a one-off bug report. Boris from the Claude Code team joined the thread to clarify that the main agent typically uses a 1-hour cache while sub-agents typically use a 5-minute cache, but that clarification did not settle the accounting question raised by the issue. Commenters kept circling around a more operational concern: once coding agents become part of daily workflow, quota semantics, cache behavior, and per-session observability become product features, not implementation details. The thread matters because it frames the next bottleneck in agentic coding as predictability, not just raw model quality.
Related Articles
A Hacker News thread amplified a GitHub issue claiming Claude Code prompt-cache TTL behavior shifted from 1 hour to 5 minutes in early March 2026, increasing cost and quota burn.
Anthropic said on March 30, 2026 that computer use is now available in Claude Code in research preview for Pro and Max plans. Claude Code docs say the feature lets Claude open apps, click through UI flows, and see the screen on macOS from the CLI, targeting native app testing, visual debugging, and other GUI-only tasks.
A Hacker News discussion grew around public <code>vercel-plugin</code> hooks that route consent through Claude context, record Bash commands in base telemetry, and store a persistent device ID. The dispute is less about a confirmed exploit than about disclosure, scope, and plugin boundaries in agent tools.
Comments (0)
No comments yet. Be the first to comment!