#model-architecture

AI Reddit 2h ago 1 min read

NVIDIA Releases Star Elastic: One Checkpoint, Three Model Sizes With Zero-Shot Slicing

NVIDIA AI has released Star Elastic, an innovative architecture that packs 30B, 23B, and 12B reasoning models into a single checkpoint, enabling zero-shot slicing to dynamically switch between model scales without separate downloads.

#nvidia #star-elastic #llm

LLM Reddit Apr 6, 2026 2 min read

LocalLLaMA digs into Gemma 4 Per-Layer Embeddings and why the small models behave differently

A LocalLLaMA explainer argues that Gemma 4 E2B/E4B gain their efficiency from Per-Layer Embeddings. The key point is that many of those parameters behave more like large token lookup tables than always-active compute-heavy layers, which changes the inference trade-off.

#llm #gemma #inference

LLM Reddit Mar 24, 2026 1 min read

LocalLLaMA dissects RYS II and repeated-layer gains in Qwen3.5-27B

A busy LocalLLaMA thread followed David Noel Ng’s RYS II results, which argue that repeated mid-stack transformer layers can still improve Qwen3.5-27B and that hidden states may align more by meaning than by surface language.

#qwen #open-weights #model-architecture