#mixture-of-experts

LLM Hacker News May 2, 2026 1 min read

DeepSeek V4: Near-Frontier LLM Performance at a Fraction of the Cost

DeepSeek released DeepSeek-V4-Pro (1.6T total parameters, 49B active) and V4-Flash (284B total, 13B active), both Mixture-of-Experts models with MIT license and 1M token context. V4-Pro is the largest open-weights model released so far, and its pricing at $1.74/M input undercuts GPT-5.4 and Claude Sonnet 4.6 by more than half.

#deepseek #llm #open-weights

LLM Reddit Mar 25, 2026 1 min read

LocalLLaMA surfaces MIT-licensed GigaChat 3.1 open weights in 702B and 10B sizes

LocalLLaMA surfaced an MIT-licensed GigaChat 3.1 release that pairs a 702B MoE model for clusters with a 10B MoE model aimed at faster deployment and lighter inference.

#gigachat #open-weights #multilingual

101

LLM Hacker News Mar 23, 2026 2 min read

Flash-MoE Shows 397B Qwen Inference on a 48GB MacBook Pro

A Hacker News discussion highlighted Flash-MoE, a pure C/Metal inference stack that streams Qwen3.5-397B-A17B from SSD and reaches interactive speeds on a 48GB M3 Max laptop.

#llm #mixture-of-experts #metal

112

LLM Reddit Mar 12, 2026 1 min read

r/LocalLLaMA Reacts to NVIDIA's Open Nemotron 3 Super Release

NVIDIA's new Nemotron 3 Super pairs a 120B total / 12B active hybrid Mamba-Transformer MoE with a native 1M-token context window and open weights, datasets, and recipes. LocalLLaMA discussion centered on whether those openness and efficiency claims translate into realistic home-lab deployments.

#nvidia #open-weights #mixture-of-experts