Harvard Study in Science: OpenAI's o1 Outperforms ER Physicians on Diagnostic Accuracy

Study Overview

A peer-reviewed study from Harvard Medical School and Beth Israel Deaconess Medical Center, published in Science, found that OpenAI's o1 model outperformed two attending physicians in diagnosing real emergency room cases.

Key Numbers

76 real ER triage cases evaluated
OpenAI o1 exact or near-exact diagnoses: 67%
Two internal medicine physicians: 55% and 50%
On 5 detailed clinical case studies: o1 scored 89% vs. 46 doctors using conventional search tools at 34%

Methodology

Both the model and physicians received identical, unprocessed EHR data as text. No additional images or lab data were provided, mirroring actual clinical information availability.

Significance and Caveats

Researchers emphasized augmentation over replacement — AI as a second-opinion tool for time-pressured ER clinicians. The 76-case sample size is too small for regulatory approval, and further studies covering rare diseases and complex comorbidities are needed before clinical deployment.

Source: TechCrunch

Sciences Apr 14, 2026 2 min read

OpenAI Says ChatGPT Is Becoming a Scientific Collaborator

OpenAI says ChatGPT is already being used at research scale across science and mathematics. In its January 2026 report, the company says advanced science and math usage reached nearly 8.4 million weekly messages from roughly 1.3 million weekly users, with early evidence that GPT-5.2 is contributing to serious mathematical work.

#openai #science #chatgpt

Sciences Reddit May 2, 2026 1 min read

LLMs Match or Exceed ER Physicians in Diagnostic Tasks, Science Study Finds

A new study published in Science found that a state-of-the-art LLM matched or exceeded human emergency physicians in diagnostic choices, emergency triage, and next-step management decisions using real ER data and hundreds of physician comparisons. Researchers say the results call for collaborative care models, not AI replacement of doctors.

#ai-medicine #healthcare #llm

Sciences X/Twitter 1d ago 1 min read

Astra turns 10 open problems into Lean-checked research claims

OpenAI’s next major model family, Astra, is being tested through research outputs rather than only benchmarks. The company says an internal version produced 10 results and that finding them would cost roughly $2,000 at Sol API rates.

#openai #astra #lean