DeepSeek V4: Near-Frontier LLM Performance at a Fraction of the Cost

DeepSeek V4 Release

Chinese AI lab DeepSeek released two new models: DeepSeek-V4-Pro and V4-Flash. Both are Mixture-of-Experts models with 1 million token context and MIT license, continuing DeepSeek approach of open-weights releases designed to compete with frontier proprietary models.

Scale and Architecture

V4-Pro has 1.6 trillion total parameters with 49B active parameters — making it the largest open-weights model released to date, surpassing Kimi K2.6 (1.1T), GLM-5.1 (754B), and more than double DeepSeek V3.2 (685B). V4-Flash is a lighter 284B total / 13B active parameter model. V4-Pro weighs in at 865GB on HuggingFace; V4-Flash at 160GB.

The Pricing Story

The headline differentiator is cost. V4-Flash costs $0.14/M input and $0.28/M output — cheaper than GPT-5.4 Nano ($0.20/$1.25). V4-Pro costs $1.74/M input and $3.48/M output, undercutting GPT-5.4 ($2.50/$15) and Claude Sonnet 4.6 ($3/$15) by more than half on both input and output.

Why the Efficiency

DeepSeek paper explains: in 1M token context scenarios, V4-Pro achieves only 27% of V3.2 single-token FLOPs and 10% of KV cache size. V4-Flash pushes further to 10% FLOPs and 7% KV cache. This architectural efficiency enables the dramatically lower pricing. Self-reported benchmarks show V4-Pro competitive with frontier models but trailing GPT-5.4 and Gemini 3.1 Pro by approximately 3 to 6 months — a gap that may be acceptable given the cost advantage for many use cases.

LLM Hacker News Apr 24, 2026 2 min read

HN Spots the Real DeepSeek V4 Story: The Docs Link Was Thin, but the Weights Were Already Live

HN did not latch onto DeepSeek V4 because of a polished launch page. The thread took off when commenters realized the front-page link was just updated docs while the weights and base models were already live for inspection.

#deepseek #llm #moe

LLM Hacker News 2d ago 2 min read

HN Reads Mistral Medium 3.5 Through a Deployment Lens, Not Just a Benchmark Table

HN treated Mistral Medium 3.5 as more than another model drop, focusing on four-GPU self-hosting, open weights, and remote coding agents rather than headline scores alone.

#mistral #open-weights #coding-agents

LLM Reddit 2d ago 2 min read

LocalLLaMA locks onto one word in Mistral Medium 3.5: dense

LocalLLaMA latched onto one detail immediately: dense 128B. Mistral Medium 3.5 drew attention because it tries to bundle reasoning, coding, and agent work into a model people can still imagine self-hosting.

#mistral #llm #open-weights