DeepMind's robot model jumps from 23% to 93% on gauge reading
Original: Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning View original →
Google DeepMind’s April 14 update to Gemini Robotics-ER 1.6 matters because it ties robotics progress to a number that operators can actually understand. On instrument reading, a task that asks a robot to interpret gauges and sight glasses in industrial settings, DeepMind says Gemini Robotics-ER 1.5 scored 23%, Gemini 3.0 Flash reached 67%, Gemini Robotics-ER 1.6 hit 86%, and the same model rose to 93% with agentic vision. That is not a vague claim about better reasoning. It is a sharp jump on a job that sits directly inside facility inspection, maintenance, and autonomous monitoring.
DeepMind says ER 1.6 improves spatial reasoning, multi-view understanding, pointing, counting, and success detection. The company also says the instrument-reading use case came from close work with Boston Dynamics. In the blog post, DeepMind describes how Spot can move through a facility, capture images of thermometers, pressure gauges, and chemical sight glasses, and feed those images into the model. The new release is aimed at turning that image stream into reliable readings without a human needing to zoom, interpret, and log each instrument by hand.
The technical detail worth watching is how DeepMind says the model gets there. Agentic vision combines visual reasoning with code execution so the system can zoom into a gauge, estimate proportions and intervals, and then apply world knowledge to interpret the reading. That is a more practical robotics story than a flashy demo clip. DeepMind has also put ER 1.6 into the Gemini API and Google AI Studio, and published a Colab notebook, which lowers the gap between a research post and developer experimentation.
Safety is the other signal in this release. DeepMind says ER 1.6 is its safest robotics model so far and reports gains over baseline Gemini 3.0 Flash on hazard detection tasks drawn from real-life injury reports: +6% in text scenarios and +10% in video scenarios. Those numbers do not guarantee deployment readiness, but they do show where the company is focusing. The next thing to watch is whether this kind of benchmark lead turns into broader real-world inspection deployments beyond the Boston Dynamics example.
Related Articles
Google DeepMind is pushing embodied reasoning closer to deployable robotics, not just lab demos. In the linked thread and blog post, Gemini Robotics-ER 1.6 reaches 93% on instrument reading with agentic vision and improves injury-risk detection in video by 10% over Gemini 3.0 Flash.
HN focused less on the model drop and more on the hard robotics question: how fast does reasoning need to be before it is useful in the physical world? Google DeepMind frames Gemini Robotics-ER 1.6 around spatial reasoning, multi-view understanding, success detection, and instrument reading, while commenters zoomed in on gauge-reading demos, latency, and deployment reality.
Google DeepMind introduced D4RT on January 22, 2026 as a unified model for dynamic 4D scene reconstruction and tracking. The company says it runs 18x to 300x faster than prior methods and is efficient enough for real-time applications in robotics and augmented reality.
Comments (0)
No comments yet. Be the first to comment!