DeepSeek cuts input cache pricing to one-tenth across its full API line

DeepSeek used a short pricing tweet to signal a much larger fight over inference economics. The company said input-cache hits across its entire API line now cost one-tenth of the previous rate, effective immediately. That matters because repeated context often decides the real bill for production systems: copilots with long system prompts, retrieval pipelines that reuse the same prefix blocks, and agent loops that keep carrying instructions forward across turns.

“Effective immediately, the price for input cache hits across the ENTIRE DeepSeek API series is reduced to just 1/10th of the original price… The DeepSeek-V4-Pro 75% OFF promotion is still active.”

The key detail is scope. DeepSeek did not frame this as a discount for one flagship model or a narrow beta tier; it described the cut as applying to the entire DeepSeek API series. That is material because cache-hit pricing can drive production behavior more than headline output-token numbers do. If the reduction really holds at 90%, developers can afford to keep stable instruction blocks, policy text, or session memory in place instead of trimming prompts aggressively on every request. For enterprise chat, coding assistants, and RAG-heavy workloads, that can shift both architecture choices and vendor routing.

The deepseek_ai account is the company’s official feed and often uses X for model and API updates. In this case the post linked back to DeepSeek’s website rather than a paper, repository, or detailed changelog, so the pricing delta itself is the main signal. The tweet also kept a second lever in view by reminding readers that the DeepSeek-V4-Pro 75% discount is still active, which suggests a coordinated push on both list pricing and promotional capture.

The next thing to watch is how the new meter appears on official pricing pages and whether rivals answer with lower cache prices of their own. Cache-heavy workloads are now one of the easiest places to win traffic, and a one-tenth rate is large enough to force procurement teams to rerun their cost models. If there are no hidden caps or carve-outs, DeepSeek has given developers a very direct reason to test long-context workloads on its stack. Source: tweet.

DeepSeek cuts input cache pricing to one-tenth across its full API line

Related Articles

HN Spots the Real DeepSeek V4 Story: The Docs Link Was Thin, but the Weights Were Already Live

Why LocalLLaMA treated DeepEP V2 and TileKernels as more than just another infra drop

Qwen3.6 lit up LocalLLaMA because the agent actually debugged the app

Comments (0)

Leave a Comment

Related Articles

HN Spots the Real DeepSeek V4 Story: The Docs Link Was Thin, but the Weights Were Already Live

Why LocalLLaMA treated DeepEP V2 and TileKernels as more than just another infra drop

Qwen3.6 lit up LocalLLaMA because the agent actually debugged the app