r/LocalLLaMAで注目の Mistral Small 4、119B MoE に 256k context と切替式 reasoning を統合

コミュニティが見ているのはbenchmarkの一行ではなくopen modelのまとめ方だ

2026年3月16日、Mistral Small 4 への r/LocalLLaMA リンクは504 pointsと196 commentsを集めた。関心が大きい理由は、単に大きなmodelが増えたからではない。Mistralは今回、instruct、reasoning、coding寄りの用途を別familyとして分けるのではなく、一つのopen modelにまとめようとしている。

Hugging Face model cardによれば、Mistral Small 4 は128 expertsのうち4 expertsがactiveになるMoE構造で、119B total parameters、tokenあたり6.5B activatedという設計だ。256k context windowを持ち、textとimage inputを受けてtextを出力し、function callingとJSON outputも扱える。さらに reasoning_effort をrequestごとに切り替えられ、軽い応答と深いreasoningを一つのmodelで行き来できる。Apache 2.0 licenseであることも商用評価では重要だ。

specだけでなくserving pathも同時に見られている

model cardでは、latency-optimized setupでMistral Small 3比のend-to-end completion timeを40%削減し、throughput-optimized setupではrequests per secondが3倍になると説明している。さらに speculative decoding 用のeagle head と、低精度serving向けのNVFP4 checkpointも用意されている。つまりresearch releaseというより、deployment economicsまで含めたpackageとして出してきた形だ。

そのため LocalLLaMA の反応はbenchmark chartだけには向いていない。ユーザーは、coding agent、long-context document work、multimodal assistant、reasoning-heavy taskを一つのopen modelで現実的に回せるかを見ている。同じmodel cardは、vLLM、Transformers、llama.cpp、SGLang の対応がまだ順次整っている途中で、一部pathはWIPだとも示している。評価軸は点数だけでなく、license、context、tool use、serving pathが揃うかどうかだ。

Mistral Small 4 は128-expert MoEで4 expertsだけをactiveにする。
119B total parameters、6.5B activated per token、256k contextを掲げる。
textとimage input、tool use、JSON output、switchable reasoningを備える。
Apache 2.0 licenseに加え、NVFP4とeagle pathも用意されている。

このthreadが示すのは、open-model communityが今やreleaseをdeployment package全体で判断しているということだ。Mistral Small 4 はそのチェックリストを一度に満たそうとするreleaseとして受け止められている。

出典: Reddit discussion, Hugging Face model card

r/LocalLLaMAで注目の Mistral Small 4、119B MoE に 256k context と切替式 reasoning を統合

コミュニティが見ているのはbenchmarkの一行ではなくopen modelのまとめ方だ

specだけでなくserving pathも同時に見られている

Related Articles

DeepSeekのvisual primitives、LocalLLaMAが沸いたのは仕組みと削除の速さ

Meta、マルチモーダル推論と並列エージェントを備えた Muse Spark を公開

Google DeepMind、Gemma 4を公開　agentic workflowとmultimodal local AIを強化

Comments (0)

Leave a Comment

Related Articles

DeepSeekのvisual primitives、LocalLLaMAが沸いたのは仕組みと削除の速さ

Meta、マルチモーダル推論と並列エージェントを備えた Muse Spark を公開
LLM Hacker News Apr 9, 2026 1 min read

Google DeepMind、Gemma 4を公開　agentic workflowとmultimodal local AIを強化
LLM Hacker News Apr 2, 2026 1 min read

コミュニティが見ているのはbenchmarkの一行ではなくopen modelのまとめ方だ

specだけでなくserving pathも同時に見られている

Related Articles

DeepSeekのvisual primitives、LocalLLaMAが沸いたのは仕組みと削除の速さ

Meta、マルチモーダル推論と並列エージェントを備えた Muse Spark を公開

Google DeepMind、Gemma 4を公開 agentic workflowとmultimodal local AIを強化

Comments (0)

Leave a Comment

Google DeepMind、Gemma 4を公開　agentic workflowとmultimodal local AIを強化