On March 16, 2026, Microsoft used NVIDIA GTC to expand Foundry Agent Service and observability, add NVIDIA Nemotron models, outline Azure infrastructure built for inference-heavy reasoning workloads, and introduce an Azure Physical AI Toolchain. The announcement is notable because it connects agent operations, hyperscale AI infrastructure, and physical-world systems in one stack.
#nvidia
RSS FeedA new r/LocalLLaMA thread argues that NVIDIA's Nemotron-Cascade-2-30B-A3B deserves more attention after quick local coding evals came in stronger than expected. The post is interesting because it lines up community measurements with NVIDIA's own push for a reasoning-oriented open MoE model that keeps activated parameters low.
NVIDIA and Oracle said on March 16, 2026 that they will build the U.S. Department of Energy's largest AI supercomputer at Argonne National Laboratory. The Solstice and Equinox systems combine 110,000 Blackwell GPUs and a stated 2,200 exaflops of AI performance for scientific discovery.
NVIDIA said on March 12, 2026 that TensorRT Edge-LLM now supports MoE models, Nemotron 2 Nano, Qwen3-TTS/ASR, and Cosmos Reason 2 on Jetson and DRIVE platforms. The company is positioning the runtime as a low-latency edge reasoning layer for robotics and autonomous vehicles.
NVIDIA said on March 20, 2026 that its Cosmos world foundation models have advanced again with Transfer 2.5, Predict 2.5, and Reason 2. The linked NVIDIA Technical Blog frames the update around higher-quality synthetic data, stronger long-tail scenario generation, and richer reasoning for robots and autonomous vehicles.
Ollama said on March 20, 2026 that NVIDIA’s Nemotron-Cascade-2 can now run through its local model stack. The official model page positions it as an open 30B MoE model with 3B activated parameters, thinking and instruct modes, and built-in paths into agent tools such as OpenClaw, Codex, and Claude.
The LocalLLaMA discussion around NVIDIA’s new model focused on an unusual mix of scale efficiency and benchmark ambition: 30B total parameters, 3B activated, plus separate thinking and instruct modes.
NVIDIA used GTC 2026 to describe how telecom operators are turning distributed network assets into AI grids. The pitch is that inference for low-latency, edge-heavy workloads should move closer to users, devices, and data.
NVIDIA announced SOL-ExecBench on March 20, 2026, a benchmark for real-world GPU kernels that scores optimized CUDA and PyTorch code against Speed-of-Light hardware bounds on NVIDIA B200 systems. The release packages 235 kernel optimization problems drawn from 124 AI models across BF16, FP8, and NVFP4 workloads.
NVIDIAAIDev said on X that Andrej Karpathy’s lab has received the first DGX Station GB300 system. NVIDIA’s GTC coverage says the deskside machine pairs the GB300 architecture with 748GB of coherent memory, up to 20 petaflops of FP4 performance, and support for models up to 1 trillion parameters.
Adobe and NVIDIA said on March 16, 2026 that they are forming a strategic partnership to build next-generation Adobe Firefly foundation models and broader agentic creative and marketing workflows. The announcement spans NVIDIA AI infrastructure, a 3D digital twin public beta, and Firefly Foundry integrations aimed at enterprise-grade and commercially safe AI content production.
NVIDIA, Hyundai Motor, and Kia said on March 16, 2026 that they are expanding their strategic partnership around autonomous driving. The collaboration links Hyundai Motor Group software-defined vehicle capabilities and fleet data with the NVIDIA DRIVE Hyperion platform for systems ranging from level 2+ assistance to level 4 robotaxis.