ChatGPT health answers cut flagged factual issues by 71%
Original: Improving health intelligence in ChatGPT View original →
Health is one of the highest-stakes places people use ChatGPT, and OpenAI is now putting numbers on how much that surface has changed. The company says more than 230 million people each week ask ChatGPT for help with health and wellness questions, from lab results to appointment preparation and insurance navigation.
The sharpest figure is a 71% decline. OpenAI says privacy-preserving monitors on recent production traffic found that the rate of health responses with at least one flagged factuality issue fell by 71% over the last two months. That matters because the claim is not limited to a lab demo; it is tied to health traffic measured across billions of messages a week.
The model behind the update is GPT-5.5 Instant, which OpenAI says is available to free ChatGPT users subject to limits. In its health-specific evaluations, including HealthBench and HealthBench Professional, GPT-5.5 Instant reached performance comparable to the company’s latest frontier Thinking models and improved substantially over GPT-5.3 Instant.
OpenAI also used physician comparison studies. Doctors wrote responses to representative health conversations with unlimited time and internet access, while a separate physician panel reviewed 3,500 responses across accuracy, communication, completeness, instruction following, and usefulness for health decisions. OpenAI says GPT-5.5 Instant responses were rated higher than physician-written and older model responses across those criteria.
The practical story is not that ChatGPT becomes a doctor. The improvement OpenAI is describing is narrower and more important: better recognition of urgent-care signals, more relevant follow-up questions, clearer uncertainty, and fewer unsupported claims. For consumer health AI, the bar is shifting from fluent medical language to measurable reductions in risky confidence.
Related Articles
Google Research is framing dermatology AI around user understanding, not just condition labels. A JAMA Dermatology study with 2,345 participants tested whether an AI-powered informational tool helped people identify skin concerns and choose better next steps.
OpenAI is presenting a more concrete test for AI-assisted science: a chemistry project that reached a validated experimental result. The tweet says GPT-5.4 worked with Molecule.one’s Maria AI and a specialized lab on a drug-discovery reaction.
AI for life sciences is getting a more realistic yardstick. OpenAI says LifeSciBench was built with 173 biotech and pharma scientists and spans 750 expert-written tasks across seven biological research workflows.