This matters because the fight over model copying is no longer staying inside lobbying letters and company blog posts. Reuters reported on April 26 that the U.S. State Department told diplomats worldwide to warn foreign governments about AI models allegedly distilled from U.S. systems, naming DeepSeek and also mentioning Moonshot AI and MiniMax.
#deepseek
RSS FeedCache-hit pricing can decide whether long-context assistants are cheap enough to ship. DeepSeek said the entire API series now charges just one-tenth of the old rate for input cache hits, while keeping a 75% off V4-Pro promotion live.
Why it matters: model launches live or die on serving and training support, not just weights. LMSYS says its Day-0 stack reached 199 tok/s on B200 and 266 tok/s on H200, while staying strong out to 900K context.
Why it matters: open models rarely arrive with both giant context claims and deployable model splits. DeepSeek put hard numbers on the release with a 1M-context design, a 1.6T/49B Pro model, and a 284B/13B Flash variant.
HN did not latch onto DeepSeek V4 because of a polished launch page. The thread took off when commenters realized the front-page link was just updated docs while the weights and base models were already live for inspection.
LocalLLaMA upvoted this because it felt like real plumbing, not another benchmark screenshot. The excitement was about DeepSeek open-sourcing faster expert-parallel communication and reusable GPU kernels.
A remarkable 13-month comparison: running frontier-level DeepSeek R1 at ~5 tokens/second cost $6,000 in early 2025. Today, you can run a significantly stronger model at the same speed on a $600 mini PC — and get 17-20 t/s with even more capable models.
The Financial Times reports that DeepSeek V4 is set to launch next week, featuring image and video generation capabilities that position it as a direct competitor to multimodal AI models from OpenAI and Google.
A trending r/LocalLLaMA thread highlighted the DualPath paper on KV-Cache bottlenecks in disaggregated inference systems. The arXiv abstract reports up to 1.87x offline throughput and 1.96x average online throughput gains while meeting SLO.
Anthropic revealed that Chinese AI labs DeepSeek, Moonshot AI, and MiniMax created over 24,000 fraudulent accounts and generated 16+ million Claude exchanges to extract its capabilities and improve their own competing models.
Anthropic revealed that Chinese AI labs DeepSeek, Moonshot AI, and MiniMax created over 24,000 fraudulent accounts and generated 16+ million Claude exchanges to extract its capabilities and improve their own competing models.
Anthropic revealed that Chinese AI labs DeepSeek, Moonshot AI, and MiniMax created over 24,000 fraudulent accounts and generated 16+ million Claude exchanges to extract its capabilities and improve their own competing models.