Hacker News Surfaces a Visual Reference for Modern LLM Architectures
Original: LLM Architecture Gallery View original →
Sebastian Raschka's LLM Architecture Gallery drew a strong response on Hacker News in March 2026 because it solves a very practical problem: modern open models are increasingly difficult to compare from scattered model cards, config files, and release posts alone. The gallery pulls families such as Llama 3 8B, OLMo 2 7B, DeepSeek V3 and R1, Gemma 3 27B, Mistral Small 3.1 24B, Llama 4 Maverick, Qwen3 variants, Kimi K2, MiniMax, and GPT-OSS into one visual reference with architecture diagrams, key details, and related concepts.
Why HN found it useful
Commenters repeatedly pointed to the same advantage: it becomes much easier to scan dense, MoE, shared-expert, hybrid-attention, and Gated DeltaNet style choices when they are presented in one comparable format. The value is less about memorizing a single model and more about rebuilding a mental map of the current LLM landscape. That makes the page useful for engineers who need a fast orientation layer before they dive into deeper research or deployment tradeoffs.
Limitations the discussion surfaced
The HN discussion was positive, but not uncritical. Some users asked for higher-resolution figures so diagrams stay readable when zoomed in. Others wanted stronger ordering cues, such as a family-tree style layout or a better sense of how architectures evolved over time and scale. Those requests are important because reference material for model builders now has to do more than display diagrams: it also has to support comparison.
Why this matters now
Recent open LLMs differ in more than parameter count. Expert routing, local attention, KV-cache strategy, and hybrid block design now affect real serving and training decisions. A readable architecture atlas lowers the friction between blog posts, config.json files, and engineering decisions. HN's reaction shows that this kind of reference is increasingly being treated as a working tool, not just a nice educational extra.
Source discussion: Hacker News
Original resource: LLM Architecture Gallery
Related Articles
Google DeepMind released DiffusionGemma, a 26B MoE open model that uses text diffusion instead of token-by-token decoding. The pitch is up to 4x faster generation on dedicated GPUs for local, interactive workflows.
A post in r/MachineLearning argues that duplicating a specific seven-layer block inside Qwen2-72B improved benchmark performance without changing any weights.
DeepSeek turned a temporary V4-Pro API discount into standard pricing, intensifying the cost race around frontier-class LLM access. The posted table cuts output pricing from $3.48 to $0.87 per million tokens.