LLMs Match or Exceed ER Physicians in Diagnostic Tasks, Science Study Finds

The Study

A new study published in Science directly compared AI and human emergency physicians on clinical diagnostic tasks. Using real emergency department data and hundreds of physician comparisons, a state-of-the-art LLM matched or exceeded human clinician performance across three key areas: diagnostic choices, emergency triage, and determining next management steps.

Collaborative Care, Not Replacement

The authors are explicit that these results do not mean AI models are ready to replace doctors. Instead, the findings indicate that the medical industry needs faster, more rigorous standardized benchmarks to evaluate AI capabilities in clinical settings. The researchers propose a collaborative care model — where AI assists physician decision-making while humans retain final judgment — as the appropriate framework for integration.

A New Benchmark for Medical AI

The study builds on decades of using difficult diagnostic cases to evaluate medical computing systems. What makes it notable is the combination of real ER data with large-scale physician comparison — not a controlled research environment. The accumulating evidence that AI can outperform physicians in specific diagnostic contexts is shifting the conversation from "can AI do this" to "how do we safely integrate it." The study adds significant weight to that shift.

Sciences Reddit Apr 25, 2026 2 min read

r/MachineLearning Bites on a Big Thesis: Deep Learning Theory Is Starting to Look Like a Real Science

r/MachineLearning pushed this paper up because it did not promise a miracle. It argued that deep learning theory is finally accumulating enough converging evidence to resemble a genuine scientific program, and commenters liked the paper's concrete framing more than another grand AI manifesto.

#deep-learning #theory #research

Sciences 1d ago 2 min read

DeepMind's AI co-clinician clears 97 of 98 primary-care queries

The important medical AI story here is not replacement but reliability. Google DeepMind says its AI co-clinician produced zero critical errors in 97 of 98 realistic primary-care queries, while physicians still beat it overall in multimodal telemedicine simulations.

#google-deepmind #medical-ai #healthcare

Sciences Apr 14, 2026 2 min read

OpenAI Says ChatGPT Is Becoming a Scientific Collaborator

OpenAI says ChatGPT is already being used at research scale across science and mathematics. In its January 2026 report, the company says advanced science and math usage reached nearly 8.4 million weekly messages from roughly 1.3 million weekly users, with early evidence that GPT-5.2 is contributing to serious mathematical work.

#openai #science #chatgpt