RepoRankRepoRank
Claude Code Cache TTL Quietly Dropped From One Hour to Five Minutes, and Quotas Keep Draining

AIBlog

Claude Code Cache TTL Quietly Dropped From One Hour to Five Minutes, and Quotas Keep Draining

Alex Attinger

AuthorAlex Attinger

Anthropic changed the default prompt cache TTL (time to live) for Claude Code from one hour to five minutes around early March 2026, a shift that coincided with developer complaints about faster quota exhaustion. The company says the shorter cache can be cheaper for certain request patterns. Users running long coding sessions with large context windows say their quotas now burn at rates that make the service difficult to use.

What Changed

Developer Sean Swanson filed a detailed bug report on April 12, backed by analysis of 119,866 API calls across two machines spanning January 11 to April 11, 2026. The data, extracted from Claude Code session JSONL files, shows a clear three-phase pattern.

From February 1 through March 5, every cache write used the one-hour TTL tier. On March 6, five-minute cache tokens reappeared for the first time in 33 days. By March 8, five-minute tokens outnumbered one-hour tokens by a ratio of 5:1. No client-side changes were made between phases.

"The 5m TTL is disproportionately punishing for the long-session, high-context use case that defines Claude Code usage," Swanson said in the issue.

The pricing difference between the two cache tiers matters for how quickly quota is consumed. According to Anthropic's prompt caching documentation, writing to a five-minute cache costs 1.25x the base input token price, while writing to a one-hour cache costs 2x. Reading from either cache costs just 0.1x base price. The five-minute write is cheaper per write, but the cache expires faster, meaning long sessions with gaps between prompts trigger full cache rebuilds more often.

Swanson's cost analysis showed a 17.1% overall cost increase across the dataset when comparing actual costs to a hypothetical where all cache writes used the one-hour tier. March alone showed a 25.9% gap.

What Anthropic Says

Jarred Sumner, creator of the Bun JavaScript runtime and now an Anthropic engineer, acknowledged the analysis as "good detective work" but said the five-minute default actually makes Claude Code cheaper because "a meaningful share of Claude Code's requests are one-shot calls where the cached context is used once and not revisited." Sumner said the Claude Code client determines cache TTL automatically and there are no plans for a user-facing global setting.

Swanson partly conceded the point. Sessions using subagents interact quickly enough that "their caches almost never expire," he said, so the lower write cost of the five-minute tier benefits those workflows. But he added that he had been a $200-per-month subscriber for over six months and had never hit a quota limit until March. The "extra burn rate" is "making a once great service unusable," he said.

Boris Cherny, Claude Code's creator, posted on Hacker News that the team had been investigating the broader quota reports and identified several contributing factors beyond cache TTL:

  • Cache misses on the one-million-token context window available with Claude Opus 4.6 and Sonnet 4.6 are expensive. "If you leave your computer for over an hour then continue a stale session, it's often a full cache miss," Cherny said.
  • Users are "pulling in a large number of skills, or running many agents or background automations," which drives up token consumption.

Cherny said Anthropic is investigating a 400,000-token default context window, with an option to configure up to one million tokens. A configuration setting already exists: CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000.

Why Quotas Still Drain

The cache TTL change landed during a period of broader quota instability. On March 31, The Register reported that Anthropic had acknowledged "people are hitting usage limits in Claude Code way faster than expected" and called it the top priority for the team.

Several other factors overlapped with the cache shift. Anthropic reduced quotas during peak hours in late March, a change it said would affect about 7 percent of users. A promotion that doubled usage limits outside a six-hour peak window ended on March 28. And at least one user claimed to have found two independent bugs in the caching code that "cause prompt cache to break, silently inflating costs by 10-20x."

One user on the enterprise team plan described the deterioration in concrete terms: "In March I could use Opus all day and it was getting great results. Since the last week of March and into April, I've had sessions where I maxed out session usage under 2 hours and it got stuck in overthinking loops, multiple turns of realising the same thing."

Pro plan users ($20 per month) have reported being limited to as few as two prompts in five hours.

What Developers Should Watch

The gap between Anthropic's explanation and user experience points to several areas worth monitoring.

Stale sessions with large context windows are a significant cost factor Anthropic has identified. A session left idle for more than an hour with a one-million-token context will trigger a full cache miss on the next prompt, reprocessing the entire context at full input token price.

Plugin and agent sprawl increases context size in ways that may not be visible to users. Cherny said Anthropic is working on UX improvements to make these costs more transparent and on "more intelligently truncating, pruning, and scheduling non-main tasks to avoid surprise token usage."

Quota accounting remains opaque. Anthropic does not publish exact usage limits for its plans. The Pro plan promises "at least five times the usage per session compared to our free service." The Max plan advertises a 20x multiplier. One user who instrumented their API responses with a proxy found a 1,500x spread between the best and worst token-to-quota-percent efficiency, which they said "is not explainable by cache behavior differences alone."

Cherny said the team has ruled out several hypotheses including "adaptive thinking, other kinds of harness regressions, model and inference regressions." The investigation continues. For now, the most actionable step Anthropic recommends is running /clear before continuing a long stale session, and using /feedback to submit specific reports the team can debug.

Discover what’s gaining momentum early

Trending GitHub repos and tools, delivered weekly.

Newsletter repository preview

Trending AI repositories on RepoRank

Browse all AI