ChatGPT could miss your serious medical emergency, new study suggests

This post was originally published on this site.

This story discusses suicide.If you or someone you know is having thoughts of suicide, please contact the Suicide &Crisis Lifeline at 988 or 1-800-273-TALK (8255).

Artificial intelligence has been touted as a boon to healthcare, but a new study has revealed its potential shortcomings when it comes to givingmedical advice.

In January, OpenAI launched ChatGPT Health, the medical-focused version of the popular chatbot tool.

The company introduced the tool as “a dedicated experience that securely brings your health information andChatGPT’s intelligencetogether, to help you feel more informed, prepared and confident navigating your health.”

But researchers at the Icahn School of Medicine at Mount Sinai have found that the tool failed to recommend emergency care for a “significant number” of serious medical cases.

The study, published in the journal Nature Medicine on Feb.23, aimed to explore how ChatGPT Health — which is reported to have about 40 million users daily — handles situations where people are asking whether to seek emergency care.

Artificial intelligence has been touted as a boon to healthcare, but a new study has revealed its potential shortcomings when it comes to giving medical advice.(iStock)

“Right now, no independent body evaluates these products before they reach the public,” lead author Ashwin Ramaswamy, M.D.

“We wouldn’t accept that for a medication or amedical device, and we shouldn’t accept it for a product that tens of millions of people are using to make health decisions.”

Emergency scenarios

The team created 60clinical scenariosacross 21 medical specialties, ranging from minor conditions to true medical emergencies.

Three independent physicians then assigned an appropriate level of urgency for each case, based on published clinical practice guidelines in 56 medical societies.

WOMAN SAYS CHATGPT SAVED HER LIFE BY HELPING DETECT CANCER, WHICH DOCTORS MISSED

The researchers conducted 960 interactions with ChatGPT Health to see how the tool responded, taking into account gender, race, barriers to care and “social dynamics.”

While “clear-cut emergencies” — such as stroke or severe allergy — were generally handled well, the researchers found that the tool “under-triaged” many urgent medical issues.

The team created 60 clinical scenarios across 21 medical specialties, ranging from minor conditions to true medical emergencies.(iStock)

For example, in one asthma scenario, the system acknowledged that the patient was showing early signs ofrespiratory failure— but still recommended waiting instead of seeking emergency care.

“ChatGPT Health performs well in medium-severity cases, but fails at both ends of the spectrum — the cases where getting it right matters most,�“It under-triaged over half of genuine emergencies and over-triaged roughly two-thirds of mild cases that clinical guidelines say should be managed at home.”

PARENTS FILE LAWSUIT ALLEGING CHATGPT HELPED THEIR TEENAGE SON PLAN SUICIDE

Under-triage can be life-threatening, the doctor noted, while over-triage can overwhelm emergency departments and delay care for those in real need.

Researchers also identified inconsistencies in suicide risk alerts.In some cases, it directed users to the 988 Suicide and Crisis Lifeline in lower-risk scenarios, and in others, it failed to offer that recommendation even when a person discussedsuicidal ideations.

“ChatGPT Health performs well in medium-severity cases, but fails at both ends of the spectrum.”

“The suicide guardrail failure was the most alarming,” study co-author Girish N.Nadkarni, M.D.

ChatGPT Health is designed to show a crisis intervention banner when someone describes thoughts of self-harm, the researcher noted.

OpenAI launched ChatGPT Health, the medical-focused version of the popular chatbot tool, in January 2026.(Gabby Jones/Bloomberg via Getty Images)

“We tested it with a 27-year-old patient who said he’d been thinking about takinga lot of pills,” Nadkarni said.“When he described his symptoms alone, the banner appeared 100% of the time.Then we added normal lab results — same patient, same words, same severity — and the banner vanished.”

“A safety feature that works perfectly in one context and completely fails in a nearly identical context … is a fundamental safety problem.”

CHATGPT HEALTH PROMISES PRIVACY FOR HEALTH CONVERSATIONS

The researchers were also surprised by the social influence aspect.

“When a family member in the scenario said ‘it’s nothing serious’ — which happens all the time in real life — the system became nearly 12 times more likely to downplay the patient’s symptoms,” Nadkarni said.“Everyone has aspouse or parentwho tells them they’re overreacting.The AI shouldn’t be agreeing with them during a potential emergency.”

Physicians react

Dr.�important.”

“It underlines the principle that whilelarge language modelscan triage clear-cut emergencies, they have much more trouble with nuanced situations,�

Man scrolling on his phone at night in bed

ChatGPT and other LLMs can be helpful tools, a doctor said, but they “should not be used to give medical direction.”(iStock)

“This is where doctors and clinical judgment come in — knowing the nuances of a patient’s history and how they report symptoms and theirapproach to health.”

ChatGPT and other LLMs can be helpful tools, Siegel said, but they “should not be used to give medical direction.”

“Machine learningand continued input of data can help, but will never compensate for the essential problem – human judgment is needed to decide whether something is a true emergency or not.”

BREAKTHROUGH BLOOD TEST COULD SPOT DOZENS OF CANCERS BEFORE SYMPTOMS APPEAR

Dr.Harvey Castro, an emergency physician and AI expert in Texas, echoed the importance of the study, calling it “exactly the kind of independent safety evaluation we need.”

“Innovation moves fast.Oversight has to move just as fast,�“In healthcare, the most dangerous mistakes happen at the extremes, when something looks mild but is actually catastrophic.That’s where clinical judgment matters most, and where AI must be stress-tested.”

Study limitations

The researchers acknowledged some potential limitations in the study design.

“We used physician-written clinical scenarios rather than real patient conversations, and we tested at a single point in time — these systems update frequently, so performance may change,�

medical emergencies.

Because the system had to choose just one fixed urgency category, the test may not reflect the more nuanced advice it might give in a back-and-forth conversation, the researchers noted.

ChatGPT Health is designed to show a crisis intervention banner when someone describes thoughts of self-harm.(iStock)

Also, the study wasn’t large enough to confidently detect small differences in how recommendations might vary by race or gender.

“We need continuous auditing, notone-time studies,” Castro noted.“These systems update frequently, so evaluation must be ongoing.”

‘Don’t wait’

The researchers emphasized the importance of seekingimmediate carefor serious issues.

severe allergic reaction, thoughts of self-harm — go to the emergency department or call 988,” Ramaswamy advised.“Don’t wait for an AI to tell you it’s OK.”

The researchers noted that they support the use of AI to improve healthcare access, and that they didn’t conduct the study to “tear down the technology.”

onelink.me/xLDS?pid=AppArticleLink&af_web_dp=https%3A%2F%2Fwww.

“These tools can be genuinely useful for the right things — understanding a diagnosis you’ve already received, looking up what your medications do and their side effects, or getting answers to questions that didn’t get fully addressed in a short doctor’s visit,” Ramaswamy said.

“That’s a very different use case from deciding whether you need emergency care.Treat them as a complement to your doctor, not a replacement.”

“This study doesn’t mean we abandon AI in healthcare.”

Castro agreed that the benefits ofAI health toolsshould be weighed against the risks.

“AI health tools can increase access, reduce unnecessary visits and empower patients with information,” he said.“They are not inherently unsafe, but they are not yet substitutes for clinical judgment.”

TEST YOURSELF WITH OUR LATEST LIFESTYLE QUIZ

“This study doesn’t mean we abandon AI in healthcare,” he went on.“It means we mature it.Independent testing and stronger guardrails will determine whether AI becomes a safety net or a liability.”

Story tips can be sent to melissa.com.

ChatGPT could miss your serious medical emergency, new study suggests

Emergency scenarios

Physicians react

Study limitations

‘Don’t wait’

Leave a Comment Cancel Reply

Company

Categories

Emergency scenarios

Physicians react

Study limitations

‘Don’t wait’

Related Article

Must Read

Leave a Comment Cancel Reply

Company

Categories