DeepSeek V4: Near-Frontier LLM Performance at a Fraction of the Cost
Original: DeepSeek V4-almost on the frontier, a fraction of the price View original →
DeepSeek V4 Release
Chinese AI lab DeepSeek released two new models: DeepSeek-V4-Pro and V4-Flash. Both are Mixture-of-Experts models with 1 million token context and MIT license, continuing DeepSeek approach of open-weights releases designed to compete with frontier proprietary models.
Scale and Architecture
V4-Pro has 1.6 trillion total parameters with 49B active parameters — making it the largest open-weights model released to date, surpassing Kimi K2.6 (1.1T), GLM-5.1 (754B), and more than double DeepSeek V3.2 (685B). V4-Flash is a lighter 284B total / 13B active parameter model. V4-Pro weighs in at 865GB on HuggingFace; V4-Flash at 160GB.
The Pricing Story
The headline differentiator is cost. V4-Flash costs $0.14/M input and $0.28/M output — cheaper than GPT-5.4 Nano ($0.20/$1.25). V4-Pro costs $1.74/M input and $3.48/M output, undercutting GPT-5.4 ($2.50/$15) and Claude Sonnet 4.6 ($3/$15) by more than half on both input and output.
Why the Efficiency
DeepSeek paper explains: in 1M token context scenarios, V4-Pro achieves only 27% of V3.2 single-token FLOPs and 10% of KV cache size. V4-Flash pushes further to 10% FLOPs and 7% KV cache. This architectural efficiency enables the dramatically lower pricing. Self-reported benchmarks show V4-Pro competitive with frontier models but trailing GPT-5.4 and Gemini 3.1 Pro by approximately 3 to 6 months — a gap that may be acceptable given the cost advantage for many use cases.
Related Articles
HN did not latch onto DeepSeek V4 because of a polished launch page. The thread took off when commenters realized the front-page link was just updated docs while the weights and base models were already live for inspection.
HN treated Mistral Medium 3.5 as more than another model drop, focusing on four-GPU self-hosting, open weights, and remote coding agents rather than headline scores alone.
LocalLLaMA latched onto one detail immediately: dense 128B. Mistral Medium 3.5 drew attention because it tries to bundle reasoning, coding, and agent work into a model people can still imagine self-hosting.
Comments (0)
No comments yet. Be the first to comment!