DeepSeek V4 Launches: 1 Trillion Parameters, 1M Context, Open-Weight
DeepSeek's Most Ambitious Release Yet
Chinese AI startup DeepSeek released DeepSeek V4 on February 17, coinciding with the Lunar New Year. The model features 1 trillion total parameters, a 1-million-token context window, and three architectural innovations: mHC (Manifold-Constrained Hyper-Connections), Engram conditional memory, and Sparse Attention. It is released as an open-weight model.
Technical Highlights
- mHC architecture: Addresses fundamental Transformer stability issues, improving large-scale training
- Engram memory: Enables efficient long-context management across sessions
- Sparse Attention: Reduces inference costs while handling extended context
- 1M-token context: Can process entire codebases in a single pass for true multi-file reasoning
Performance Claims
DeepSeek's internal benchmarks report that V4 surpasses Claude 3.5 Sonnet and GPT-4o on coding tasks, achieving over 80% on SWE-bench. The company claims inference costs are 10–40× lower than comparable Western frontier models.
Runs on Consumer Hardware
As an open-weight release, V4 is designed to run on dual NVIDIA RTX 4090s or a single RTX 5090 — making state-of-the-art coding AI accessible outside cloud infrastructure. The model is available for immediate download by developers globally.
Related Articles
Chinese AI lab DeepSeek plans to release its flagship V4 model this week—a 1-trillion-parameter native multimodal model built around Huawei Ascend chips that deliberately bypasses Nvidia and AMD.
DeepSeek is set to launch its next-generation coding-focused AI model V4 in mid-February, featuring 1M+ token context windows and consumer GPU support for unprecedented developer accessibility.
Alibaba launched Qwen3.5, a 397B-parameter open-weight multimodal model supporting 201 languages. The company claims it outperforms GPT-5.2, Claude Opus 4.5, and Gemini 3 on benchmarks, while costing 60% less than its predecessor.
Comments (0)
No comments yet. Be the first to comment!