DeepSeek cuts input cache pricing to one-tenth across its full API line
Original: DeepSeek Input Cache Price Drop View original →
DeepSeek used a short pricing tweet to signal a much larger fight over inference economics. The company said input-cache hits across its entire API line now cost one-tenth of the previous rate, effective immediately. That matters because repeated context often decides the real bill for production systems: copilots with long system prompts, retrieval pipelines that reuse the same prefix blocks, and agent loops that keep carrying instructions forward across turns.
“Effective immediately, the price for input cache hits across the ENTIRE DeepSeek API series is reduced to just 1/10th of the original price… The DeepSeek-V4-Pro 75% OFF promotion is still active.”
The key detail is scope. DeepSeek did not frame this as a discount for one flagship model or a narrow beta tier; it described the cut as applying to the entire DeepSeek API series. That is material because cache-hit pricing can drive production behavior more than headline output-token numbers do. If the reduction really holds at 90%, developers can afford to keep stable instruction blocks, policy text, or session memory in place instead of trimming prompts aggressively on every request. For enterprise chat, coding assistants, and RAG-heavy workloads, that can shift both architecture choices and vendor routing.
The deepseek_ai account is the company’s official feed and often uses X for model and API updates. In this case the post linked back to DeepSeek’s website rather than a paper, repository, or detailed changelog, so the pricing delta itself is the main signal. The tweet also kept a second lever in view by reminding readers that the DeepSeek-V4-Pro 75% discount is still active, which suggests a coordinated push on both list pricing and promotional capture.
The next thing to watch is how the new meter appears on official pricing pages and whether rivals answer with lower cache prices of their own. Cache-heavy workloads are now one of the easiest places to win traffic, and a one-tenth rate is large enough to force procurement teams to rerun their cost models. If there are no hidden caps or carve-outs, DeepSeek has given developers a very direct reason to test long-context workloads on its stack. Source: tweet.
Related Articles
HN did not latch onto DeepSeek V4 because of a polished launch page. The thread took off when commenters realized the front-page link was just updated docs while the weights and base models were already live for inspection.
LocalLLaMA upvoted this because it felt like real plumbing, not another benchmark screenshot. The excitement was about DeepSeek open-sourcing faster expert-parallel communication and reusable GPU kernels.
r/LocalLLaMA pushed this past 900 points because it was not another score table. The hook was a local coding agent noticing and fixing its own canvas and wave-completion bugs.
Comments (0)
No comments yet. Be the first to comment!