Harvard Study in Science: OpenAI's o1 Outperforms ER Physicians on Diagnostic Accuracy
Study Overview
A peer-reviewed study from Harvard Medical School and Beth Israel Deaconess Medical Center, published in Science, found that OpenAI's o1 model outperformed two attending physicians in diagnosing real emergency room cases.
Key Numbers
- 76 real ER triage cases evaluated
- OpenAI o1 exact or near-exact diagnoses: 67%
- Two internal medicine physicians: 55% and 50%
- On 5 detailed clinical case studies: o1 scored 89% vs. 46 doctors using conventional search tools at 34%
Methodology
Both the model and physicians received identical, unprocessed EHR data as text. No additional images or lab data were provided, mirroring actual clinical information availability.
Significance and Caveats
Researchers emphasized augmentation over replacement — AI as a second-opinion tool for time-pressured ER clinicians. The 76-case sample size is too small for regulatory approval, and further studies covering rare diseases and complex comorbidities are needed before clinical deployment.
Source: TechCrunch
Related Articles
AI for life sciences is getting a more realistic yardstick. OpenAI says LifeSciBench was built with 173 biotech and pharma scientists and spans 750 expert-written tasks across seven biological research workflows.
Google Research is framing dermatology AI around user understanding, not just condition labels. A JAMA Dermatology study with 2,345 participants tested whether an AI-powered informational tool helped people identify skin concerns and choose better next steps.
OpenAI is presenting a more concrete test for AI-assisted science: a chemistry project that reached a validated experimental result. The tweet says GPT-5.4 worked with Molecule.one’s Maria AI and a specialized lab on a drug-discovery reaction.