Google tests AMIE in real outpatient care and reports zero safety stops
Original: Exploring the feasibility of conversational diagnostic AI in a real-world clinical study View original →
From benchmark to clinic
On March 11, 2026, Google Research and Google DeepMind published a prospective real-world feasibility study of conversational diagnostic AI called AMIE. The work was done with Beth Israel Deaconess Medical Center and aimed to test whether a diagnostic assistant that had looked promising in simulated evaluations could operate safely and usefully in actual ambulatory primary care.
The study was pre-registered, IRB-approved, and conducted at a single center. One hundred adult patients completed an AMIE interaction before seeing a physician, and 98 later attended their scheduled appointment. Google says a human AI supervisor was available to intervene according to four predefined safety criteria, but no safety stop was triggered during the study.
What the results show
Google reports that AMIE performed on par with primary care physicians on the quality of the overall management plan and on differential-diagnosis quality. Primary care physicians still outperformed AMIE on practicality and cost-effectiveness of management plans, which is an important reminder that real care delivery includes operational judgment, not only diagnostic reasoning.
AMIE’s differential diagnosis included the final physician diagnosis in 90% of cases and reached 75% top-3 accuracy. Google also says patient trust in the AI system increased after the interaction and remained elevated at follow-up. Those signals suggest that conversational diagnostic systems may be clinically useful as intake and decision-support tools, especially when they help structure information before a visit.
- Scale of test: 100 completed patient interactions, 98 subsequent appointments.
- Safety monitoring: no intervention by the human AI supervisor was required.
- Performance nuance: parity in some diagnostic measures, but physicians remained better on practicality and cost.
Google is careful not to overclaim. The company notes that this was a feasibility study, not a controlled proof of clinical efficacy. The system was text-based, run at a single center, and should not yet be read as a replacement for physician workflow. Even so, the study is notable because it moves diagnostic AI evaluation out of synthetic benchmarks and into real care settings, which is the harder test for any medical AI system.
Related Articles
Google Research says a prospective study with Beth Israel Deaconess Medical Center found AMIE could operate with zero safety stops, strong diagnostic performance, and improved patient trust under live physician oversight. Published on March 11, 2026, the work is an early real-world test of conversational diagnostic AI inside a primary care workflow.
At The Check Up on Mar 17, 2026, Google paired a $10 million clinician-AI education commitment with health upgrades across Search, YouTube, and Fitbit. The company is trying to combine easier-to-understand health information with more personalized wellness guidance built on consumer health data.
Google says joint research with Imperial College London and the UK’s NHS found that an experimental AI system identified 25% of interval cancers missed by conventional screening. The studies also suggest AI could reduce screening workload, while highlighting trust and calibration challenges in real clinical workflows.
Comments (0)
No comments yet. Be the first to comment!