Google tests AMIE in real outpatient care and reports zero safety stops

From benchmark to clinic

On March 11, 2026, Google Research and Google DeepMind published a prospective real-world feasibility study of conversational diagnostic AI called AMIE. The work was done with Beth Israel Deaconess Medical Center and aimed to test whether a diagnostic assistant that had looked promising in simulated evaluations could operate safely and usefully in actual ambulatory primary care.

The study was pre-registered, IRB-approved, and conducted at a single center. One hundred adult patients completed an AMIE interaction before seeing a physician, and 98 later attended their scheduled appointment. Google says a human AI supervisor was available to intervene according to four predefined safety criteria, but no safety stop was triggered during the study.

What the results show

Google reports that AMIE performed on par with primary care physicians on the quality of the overall management plan and on differential-diagnosis quality. Primary care physicians still outperformed AMIE on practicality and cost-effectiveness of management plans, which is an important reminder that real care delivery includes operational judgment, not only diagnostic reasoning.

AMIE’s differential diagnosis included the final physician diagnosis in 90% of cases and reached 75% top-3 accuracy. Google also says patient trust in the AI system increased after the interaction and remained elevated at follow-up. Those signals suggest that conversational diagnostic systems may be clinically useful as intake and decision-support tools, especially when they help structure information before a visit.

Scale of test: 100 completed patient interactions, 98 subsequent appointments.
Safety monitoring: no intervention by the human AI supervisor was required.
Performance nuance: parity in some diagnostic measures, but physicians remained better on practicality and cost.

Google is careful not to overclaim. The company notes that this was a feasibility study, not a controlled proof of clinical efficacy. The system was text-based, run at a single center, and should not yet be read as a replacement for physician workflow. Even so, the study is notable because it moves diagnostic AI evaluation out of synthetic benchmarks and into real care settings, which is the harder test for any medical AI system.

Google tests AMIE in real outpatient care and reports zero safety stops

From benchmark to clinic

What the results show

Related Articles

Google reports early real-world feasibility for conversational diagnostic AI AMIE in primary care

Google pairs $10M clinician-AI training with Fitbit and Search health upgrades

Google and Taiwan turn 20 years of health data into Gemini-assisted diabetes screening

Comments (0)

Leave a Comment

Related Articles

Google reports early real-world feasibility for conversational diagnostic AI AMIE in primary care
Sciences Mar 12, 2026 2 min read

Google pairs $10M clinician-AI training with Fitbit and Search health upgrades
Sciences Mar 20, 2026 2 min read

Google and Taiwan turn 20 years of health data into Gemini-assisted diabetes screening
Sciences Mar 8, 2026 2 min read