HN Spots the Real DeepSeek V4 Story: The Docs Link Was Thin, but the Weights Were Already Live

HN did not treat DeepSeek V4 like a normal model drop. The thread took off because people opened the link and noticed it pointed to updated API docs, not a glossy launch page, and then other commenters immediately surfaced the real payload: the weights were already live on Hugging Face. That mismatch became part of the story. For HN, the interesting bit was not only that DeepSeek had shipped another flagship open model, but that the release escaped through infrastructure first and marketing second.

The model card fills in why the reaction was so intense. DeepSeek says the preview V4 line has two Mixture-of-Experts models: DeepSeek-V4-Pro with 1.6T total parameters and 49B activated, and DeepSeek-V4-Flash with 284B total and 13B activated. Both stretch to a 1M-token context window. DeepSeek also says the new hybrid attention stack cuts single-token inference FLOPs to 27% of V3.2 and KV cache to 10% in the 1M-token setting, while training ran on more than 32T tokens. In other words, this is not a minor checkpoint refresh; it is a big swing at long-context efficiency, agentic use, and open-weight bragging rights. The official model card and technical report are linked from Hugging Face.

HN commenters zeroed in on the release mechanics and the practical implications. One thread pointed out that the front-page link undersold the launch because the real artifacts were the model weights, base variants, and evaluation tables that appeared on Hugging Face almost immediately. Another cluster of comments focused on the claim that V4-Pro-Max now sits at the top of the open-source field. DeepSeek’s own tables back that posture with numbers like 93.5 on LiveCodeBench and 67.9 on Terminal Bench 2.0, while also positioning the model against Opus, GPT, Gemini, and Kimi on long-context and agent benchmarks.

The reason this landed so hard on HN is simple: people there have seen enough model launches to ignore vague hype. What kept this thread moving was the sense that DeepSeek dropped something operationally real. Weights, base models, long-context engineering, and competitive benchmark tables all appeared fast enough for developers to inspect rather than merely admire. The Hacker News thread reads like a live audit of whether the open-weight world can still spring surprises on short notice.

HN Spots the Real DeepSeek V4 Story: The Docs Link Was Thin, but the Weights Were Already Live

Related Articles

Kimi K2.6 turned HN’s model debate toward open-weight coding agents

Why LocalLLaMA treated DeepEP V2 and TileKernels as more than just another infra drop

Reddit Flags Qwen3.5-35B-A3B on Hugging Face with MoE and Long Context

Comments (0)

Leave a Comment

Related Articles

Kimi K2.6 turned HN’s model debate toward open-weight coding agents

Why LocalLLaMA treated DeepEP V2 and TileKernels as more than just another infra drop

Reddit Flags Qwen3.5-35B-A3B on Hugging Face with MoE and Long Context
LLM Reddit Feb 25, 2026 2 min read