SpatialClaw beats a prior spatial agent by 11.2 points on 20 tests
Original: SpatialClaw beats a prior spatial agent by 11.2 points across 20 benchmarks View original →
Spatial reasoning agents may need a better action interface more than a longer list of tools. NVIDIA AI wrote on X that “Code is the right action interface” for these agents, pointing to SpatialClaw, a training-free system that lets a VLM-backed agent write Python inside a persistent kernel. Instead of dispatching only fixed tool calls, the agent can compose perception modules, inspect intermediate outputs, and revise its strategy step by step.
The linked project page gives the strongest evidence. SpatialClaw reports an 11.2-point margin over a recent prior spatial agent across 20 benchmarks, with no benchmark-specific or model-specific tuning. It improves on 19 of 20 benchmarks on the same backbone and shows consistent gains across six VLM backbones. The page also reports an average +6.5 point gain over a no-tool baseline, with larger single-benchmark jumps such as DSI-Bench +17.6 points, MindCube +15.3 points, and MMSI +13.4 points.
NVIDIA AI’s account typically posts research, developer tooling, and infrastructure updates, and this item is more architectural than promotional. The claim is not that a new model alone solved spatial reasoning, but that executable code lets the agent turn perception outputs into reusable variables and computations. What to watch next is whether this pattern survives outside curated benchmarks: sandboxing, tool-state reproducibility, latency, and error recovery will decide whether code-as-action becomes a common interface for visual agents. The source tweet is available on X.
Related Articles
NVIDIA says Vera is now in full production and can complete agentic workloads 1.8x faster than x86 CPUs. OpenAI, Anthropic, SpaceXAI, ByteDance, CoreWeave, and OCI are among the names tied to adoption or evaluation.
NVIDIA Research is tying real-time animation and robotics motion to one generative backbone. MotionBricks reports 350,000-plus motion clips, 15,000 FPS, and 2 ms latency in a SIGGRAPH 2026 project release.
NVIDIAAIDev said on X that Andrej Karpathy’s lab has received the first DGX Station GB300 system. NVIDIA’s GTC coverage says the deskside machine pairs the GB300 architecture with 748GB of coherent memory, up to 20 petaflops of FP4 performance, and support for models up to 1 trillion parameters.