Gemini Robotics lets Spot follow plain-English home tasks
Original: We teamed up with Boston Dynamics to power their robot Spot with Gemini Robotics embodied reasoning models. View original →
Google DeepMind's April 16 X post is high-signal because it ties Gemini Robotics to a physical robot that already works in industrial settings. The source tweet says the team used "Gemini Robotics embodied reasoning models" to power Boston Dynamics' Spot. It was created at 2026-04-16 13:03:32 UTC, safely inside the freshness window. See the source tweet.
A follow-up tweet explains the bridge: instead of writing complex code, the team interacted with Spot in plain English, giving Gemini Robotics ER a basic set of tools to move, take photos, and grab objects. The linked Boston Dynamics blog post says the demo grew out of a 2025 hackathon and used Spot's SDK to translate Gemini Robotics outputs into robot API calls. The article also notes strict boundaries: Gemini Robotics could only use the tools exposed through the API.
The architecture detail is important because it keeps the model away from direct, unconstrained robot control. Boston Dynamics describes a tool interface: Gemini Robotics interprets a natural-language request, chooses from exposed capabilities, and Spot's existing APIs execute the concrete robot actions. That split is a common pattern for applied robotics because it gives developers places to enforce limits, log decisions, and recover when a plan fails. It also means the headline capability is not "a robot understands everything"; it is that a foundation model can sit above a narrower set of tested robot primitives and compose them into useful tasks.
Google DeepMind's account usually posts research, model releases, and applied AI demos. Boston Dynamics' write-up makes this more than a polished video: it describes a tool layer for navigation, image capture, object identification, grasping, and placement. What to watch next is whether this stays a lab demo or becomes a repeatable developer pattern for Spot and Orbit customers. The hard questions are latency, failure recovery, and how much natural-language flexibility can be allowed around a robot arm in real spaces.
Related Articles
Google DeepMind is pushing embodied reasoning closer to deployable robotics, not just lab demos. In the linked thread and blog post, Gemini Robotics-ER 1.6 reaches 93% on instrument reading with agentic vision and improves injury-risk detection in video by 10% over Gemini 3.0 Flash.
HN focused less on the model drop and more on the hard robotics question: how fast does reasoning need to be before it is useful in the physical world? Google DeepMind frames Gemini Robotics-ER 1.6 around spatial reasoning, multi-view understanding, success detection, and instrument reading, while commenters zoomed in on gauge-reading demos, latency, and deployment reality.
Google DeepMind's latest robotics model pushes a hard industrial task from 23% to 93% accuracy when agentic vision is enabled, putting a concrete number on embodied reasoning progress. The April 14 release also puts Gemini Robotics-ER 1.6 into the Gemini API and Google AI Studio, so developers can test the upgrade immediately.
Comments (0)
No comments yet. Be the first to comment!