LLM sources.twitter 3h ago 2 min read
Multimodal agents still pay a tax for chaining separate vision, audio, and text models. NVIDIA says Nemotron 3 Nano Omni collapses that stack into a 30B model with 256K context and up to 9.2x higher effective video system capacity at the same responsiveness target.