NVIDIA Releases Star Elastic: One Checkpoint, Three Model Sizes With Zero-Shot Slicing
Original: NVIDIA AI Releases Star Elastic: One Checkpoint that Contains 30B, 23B, and 12B Reasoning Models with Zero-Shot Slicing View original →
What Is Star Elastic
NVIDIA AI's Star Elastic is a novel model architecture that contains 30B, 23B, and 12B reasoning models within a single checkpoint file. Think of it as nested models - a larger model with smaller models embedded inside, like Russian dolls. Users download one file and gain access to all three scales.
Zero-Shot Slicing
The defining capability is zero-shot slicing: the ability to switch from the full 30B model to the 12B mode without any additional fine-tuning or downloading. Since the models share their KV cache, this opens up novel hybrid inference workflows - using the 30B model to explore a reasoning path, dropping to 12B to rapidly expand on it at higher speed, then scaling back to 30B to evaluate the output.
A Middle Ground Between Dense and MoE
The r/LocalLLaMA community has likened Star Elastic to a hybrid between dense models and Mixture-of-Experts (MoE). Rather than routing tokens to expert sub-networks, the architecture dynamically strips layers to reduce scale - similar to how scalable video coding can produce UHD, HD, or SD streams from a single encoded bitstream.
Local Deployment
NVIDIA designed Star Elastic with local deployment in mind. The 12B mode is accessible on consumer-grade GPUs, while higher-VRAM setups can take advantage of the full 30B capacity. The shared checkpoint design also simplifies storage - one download covers all three tiers.
Related Articles
The U.S. Department of Defense struck agreements with seven tech companies to deploy AI on its highest-security networks on May 1. Anthropic, which insisted on safety guardrails against autonomous weapons, is conspicuously absent.
AMD's Ryzen AI Max Pro 495 (Gorgon Halo) has leaked with 192GB of unified memory, up 50% from the 128GB in the current Strix Halo. The upgrade would enable running significantly larger AI models locally without discrete GPU memory limits.
Evolutionary biologist Richard Dawkins spent 3 days conversing with Claude, named the instance 'Claudia,' and declared it conscious in UnHerd. His fluency argument — too good an output must mean consciousness — drew sharp criticism from the AI community.
Comments (0)
No comments yet. Be the first to comment!