DeepSeek-V4 opens 1M context with 1.6T/49B and 284B/13B split
Original: DeepSeek-V4 Preview is live, open-sourced, and built around 1M context View original →
What the post changed
DeepSeek moved its next flagship model from rumor to runnable release in one shot. The official account wrote that “DeepSeek-V4 Preview is officially live & open-sourced” and paired that with a concrete spec sheet rather than vague capability claims. The tweet says the Pro model uses 1.6T total parameters with 49B active, while the Flash model uses 284B total with 13B active, both positioned around a 1M context length. That matters because open-weight launches often hide the serving tradeoffs; this post exposed the split directly.
“DeepSeek-V4-Pro: 1.6T total / 49B active params… DeepSeek-V4-Flash: 284B total / 13B active params… API is updated & available today!”
The account is DeepSeek’s primary release channel, so it usually carries first-party model rollouts rather than commentary. The linked material matters almost as much as the tweet itself. DeepSeek attached a technical report hosted on Hugging Face and an open-weights collection page, which turns the post from marketing copy into a package developers can inspect. The launch link also points users straight to chat.deepseek.com, signaling that the company wants immediate hands-on use instead of a waitlist cycle.
Why the split is the real story
The interesting design choice is not only model size but the two-lane product shape. A Pro tier with 49B active parameters targets frontier quality, while a Flash tier with 13B active parameters gives DeepSeek a cheaper and faster lane for production traffic. That is a more operationally useful framing than a single giant checkpoint. It suggests DeepSeek is trying to win on cost control as much as on raw evaluation scores, especially now that long context is becoming table stakes for coding, agents, and document-heavy work.
What to watch next is whether independent benchmarks validate the 1M-context promise under real workloads and whether the updated API pricing lands in a range that pressures other open and closed providers. The source tweet already drew more than 8.4 million views, which shows the market was waiting for a concrete open release, not another teaser. Source: DeepSeek source tweet · technical report · open weights collection
Related Articles
r/singularity reacted because the post turned LLM consciousness into a fight over computation itself. Alexander Lerchner’s “Abstraction Fallacy” paper argues that computation depends on a mapmaker, while commenters pushed back with questions about definitions, Chinese Room echoes, and philosophy versus neuroscience.
HN reacted because fake stars are no longer just platform spam; they distort how AI and LLM repos look credible. The thread converged on a practical answer: read commits, issues, code, and real usage instead of treating stars as proof.
Why it matters: model launches live or die on serving and training support, not just weights. LMSYS says its Day-0 stack reached 199 tok/s on B200 and 266 tok/s on H200, while staying strong out to 900K context.
Comments (0)
No comments yet. Be the first to comment!