HN Flags an Anthropic Cache TTL Regression That Could Raise Claude Code Costs
Original: Anthropic downgraded cache TTL on March 6th View original →
A Hacker News discussion on April 12 centered on GitHub issue #46829, where a Claude Code user argues that Anthropic's prompt-cache behavior changed materially in early March 2026. The key claim is not that Claude Code updated locally, but that the service-side default for cache time-to-live appears to have shifted from 1 hour to 5 minutes, which would make repeated long-context sessions much more expensive.
The issue cites 119,866 API calls collected from Jan. 11 to Apr. 11, 2026 across two machines and two usage patterns. Because Claude Code logs separate counters for ephemeral_5m_input_tokens and ephemeral_1h_input_tokens, the author says the TTL tier is visible from raw JSONL files. In the data they shared, Feb. 1 through Mar. 5 was almost entirely 1-hour cache creation, March 6 became mixed, and by March 8 five-minute cache writes had become dominant.
What makes the thread important to developers is the operational impact. The report estimates a 20% to 32% increase in cache-creation cost and a noticeable jump in quota burn for subscription users, and it explicitly links the behavior to another quota-exhaustion complaint discussed in the Claude Code issue tracker. That still does not prove Anthropic intentionally changed the default, but it does show how much hidden server-side behavior can surface in tooling logs before a vendor publishes a changelog.
For teams running long coding sessions, the practical takeaway is straightforward: inspect local session JSONL files, compare 5-minute versus 1-hour cache writes, and watch whether prompt reuse patterns changed after March 6, 2026. If the regression is real, batching related work into tighter time windows or reducing unnecessary context churn may matter more than model choice. If Anthropic clarifies the default later, this thread will still stand as a useful example of users reverse-engineering platform economics from developer telemetry.
Sources: Hacker News discussion, GitHub issue #46829, related quota issue #45756.
Related Articles
A high-traffic Hacker News thread pushed Alex Kim's Claude Code leak analysis into the center of the developer-tools conversation. The exposed source map turned vague concerns about anti-distillation, telemetry, and hidden behavior into named flags and inspectable code paths.
The GitHub project Caveman claims it can cut output tokens by about 75% by stripping filler language while preserving code and technical terms. On Hacker News, developers are treating it as a serious experiment in reducing agent cost, latency, and verbosity.
A March 13, 2026 Hacker News thread focused on Anthropic's 1M context GA update for Claude Opus 4.6 and Sonnet 4.6, especially the removal of long-context premiums. The release also raises media limits to 600 images or PDF pages and rolls 1M context into Claude Code for Max, Team, and Enterprise users.
Comments (0)
No comments yet. Be the first to comment!