HN Spots the Real DeepSeek V4 Story: The Docs Link Was Thin, but the Weights Were Already Live
Original: DeepSeek v4 View original →
HN did not treat DeepSeek V4 like a normal model drop. The thread took off because people opened the link and noticed it pointed to updated API docs, not a glossy launch page, and then other commenters immediately surfaced the real payload: the weights were already live on Hugging Face. That mismatch became part of the story. For HN, the interesting bit was not only that DeepSeek had shipped another flagship open model, but that the release escaped through infrastructure first and marketing second.
The model card fills in why the reaction was so intense. DeepSeek says the preview V4 line has two Mixture-of-Experts models: DeepSeek-V4-Pro with 1.6T total parameters and 49B activated, and DeepSeek-V4-Flash with 284B total and 13B activated. Both stretch to a 1M-token context window. DeepSeek also says the new hybrid attention stack cuts single-token inference FLOPs to 27% of V3.2 and KV cache to 10% in the 1M-token setting, while training ran on more than 32T tokens. In other words, this is not a minor checkpoint refresh; it is a big swing at long-context efficiency, agentic use, and open-weight bragging rights. The official model card and technical report are linked from Hugging Face.
HN commenters zeroed in on the release mechanics and the practical implications. One thread pointed out that the front-page link undersold the launch because the real artifacts were the model weights, base variants, and evaluation tables that appeared on Hugging Face almost immediately. Another cluster of comments focused on the claim that V4-Pro-Max now sits at the top of the open-source field. DeepSeek’s own tables back that posture with numbers like 93.5 on LiveCodeBench and 67.9 on Terminal Bench 2.0, while also positioning the model against Opus, GPT, Gemini, and Kimi on long-context and agent benchmarks.
The reason this landed so hard on HN is simple: people there have seen enough model launches to ignore vague hype. What kept this thread moving was the sense that DeepSeek dropped something operationally real. Weights, base models, long-context engineering, and competitive benchmark tables all appeared fast enough for developers to inspect rather than merely admire. The Hacker News thread reads like a live audit of whether the open-weight world can still spring surprises on short notice.
Related Articles
HN read Kimi K2.6 as a test of whether open-weight coding agents can last through real engineering work. The 12-hour and 13-hour coding cases drew attention, while commenters immediately pressed on speed, provider accuracy, and benchmark realism.
LocalLLaMA upvoted this because it felt like real plumbing, not another benchmark screenshot. The excitement was about DeepSeek open-sourcing faster expert-parallel communication and reusable GPU kernels.
A high-engagement r/LocalLLaMA post surfaced the Qwen3.5-35B-A3B model card on Hugging Face. The card emphasizes MoE efficiency, long context handling, and deployment paths across common open-source inference stacks.
Comments (0)
No comments yet. Be the first to comment!