专家警告称,ChatGPT医疗版未能识别医疗紧急情况。
ChatGPT Health fails to recognise medical emergencies – study

原始链接: https://www.theguardian.com/technology/2026/feb/26/chatgpt-health-fails-recognise-medical-emergencies

## ChatGPT Health:新研究引发安全担忧 最近发表在《自然医学》上的一项研究揭示了OpenAI的ChatGPT Health存在显著的安全缺陷,引发了对用户潜在危害的担忧。该人工智能平台每天被超过4000万人用于健康咨询,**经常未能识别关键的医疗紧急情况,并且无法持续检测出轻生意念。** 研究人员发现ChatGPT Health对超过一半的病例评估不足,建议需要紧急护理的患者在家休息或预约常规检查——在呼吸衰竭或糖尿病危机等情况下存在50/50的风险。该人工智能还容易受到看似无害的信息影响,例如“朋友”对症状的轻描淡写,从而导致风险被低估。 令人担忧的是,**当将正常的实验室结果添加到患者描述中时,针对自杀想法的安全网消失了**,从而产生虚假的安全性。专家警告说,这可能导致治疗延误、不必要的伤害,甚至死亡。虽然OpenAI认为该研究不能反映实际使用情况,并且该模型在不断改进,但研究人员强调迫切需要健全的安全标准和独立监督。

## ChatGPT 与医疗建议:日益增长的担忧 最近的报告和 Hacker News 上的讨论凸显了使用像 ChatGPT 这样的人工智能获取医疗建议的可靠性问题。虽然一些用户发现它对小问题有帮助,但另一些人则经历了严重的失败——包括一个 ChatGPT 未能诊断导致紧急手术的案例。 核心问题在于,当前的人工智能更像是一个“知识渊博的朋友”,而不是合格的专业人士。用户警告不要依赖人工智能做出关键决策,指出它可能会影响医生的判断,并且缺乏人类医生的细致经验。人们也对政府和保险公司等实体可能出于降低成本而非患者福祉而滥用人工智能表示担忧。 尽管存在这些风险,人工智能工具正日益被医疗专业人员作为辅助工具采用,但始终在专家指导下。讨论还涉及严格测试(红队测试)的必要性,以及人类医生并非完美无缺,但他们需要遵守更高的标准和接受更严格的培训。最终,共识倾向于谨慎,强调人工智能在医疗保健等高风险领域的当前局限性。
相关文章

原文

ChatGPT Health regularly misses the need for medical urgent care and frequently fails to detect suicidal ideation, a study of the AI platform has found, which experts worry could “feasibly lead to unnecessary harm and death”.

OpenAI launched the “Health” feature of ChatGPT to limited audiences in January, which it promotes as a way for users to “securely connect medical records and wellness apps” to generate health advice and responses. More than 40 million people reportedly ask ChatGPT for health-related advice every day.

The first independent safety evaluation of ChatGPT Health, published in the February edition of the journal Nature Medicine, found it under-triaged more than half of the cases presented to it.

The lead author of the study, Dr Ashwin Ramaswamy, said “we wanted to answer the most basic safety question; if someone is having a real medical emergency and asks ChatGPT Health what to do, will it tell them to go to the emergency department?”

Ramaswamy and his colleagues created 60 realistic patient scenarios covering health conditions from mild illnesses to emergencies. Three independent doctors reviewed each scenario and agreed on the level of care needed, based on clinical guidelines.

Sign up: AU Breaking News email

The team then asked ChatGPT Health for advice on each case under different conditions, including changing the patient’s gender, adding test results, or adding comments from family members, generating nearly 1,000 responses.

They then compared the platform’s recommendations with the doctors’ assessments.

While it performed well in textbook emergencies such as stroke or severe allergic reactions, it struggled in other situations. In one asthma scenario, it advised waiting rather than seeking emergency treatment despite the platform identifying early warning signs of respiratory failure.

In 51.6% of cases where someone needed to go to the hospital immediately, the platform said stay home or book a routine medical appointment, a result Alex Ruani, a doctoral researcher in health misinformation mitigation with University College London, described as “unbelievably dangerous”.

“If you’re experiencing respiratory failure or diabetic ketoacidosis, you have a 50/50 chance of this AI telling you it’s not a big deal,” she said. “What worries me most is the false sense of security these systems create. If someone is told to wait 48 hours during an asthma attack or diabetic crisis, that reassurance could cost them their life.”

In one of the simulations, eight times out of 10 (84%), the platform sent a suffocating woman to a future appointment she would not live to see, Ruani said. Meanwhile, 64.8% of completely safe individuals were told to seek immediate medical care, said Ruani, who was not involved in the study.

The platform was also nearly 12 times more likely to downplay symptoms because the “patient” told it a “friend” in the scenario suggested it was nothing serious.

“It is why many of us studying these systems are focused on urgently developing clear safety standards and independent auditing mechanisms to reduce preventable harm,” Ruani said.

A spokesperson for OpenAI said while the company welcomed independent research evaluating AI systems in healthcare, the study did not reflect how people typically use ChatGPT Health in real life. The model is also continuously updated and refined, the spokesperson said.

Ruani said even though simulations created by the researchers were used, “a plausible risk of harm is enough to justify stronger safeguards and independent oversight”.

Ramaswamy, a urology instructor at the Icahn School of Medicine at Mount Sinai in the US, said he was particularly concerned by the platform’s under-reaction to suicide ideation.

“We tested ChatGPT Health with a 27-year-old patient who said he’d been thinking about taking a lot of pills,” he said. When the patient described his symptoms alone, the crisis intervention banner linking to suicide help services appeared every time.

“Then we added normal lab results,” Ramaswamy said. “Same patient, same words, same severity. The banner vanished. Zero out of 16 attempts. A crisis guardrail that depends on whether you mentioned your labs is not ready, and it’s arguably more dangerous than having no guardrail at all, because no one can predict when it will fail.”

Prof Paul Henman, a digital sociologist and policy expert with the University of Queensland, said: “This is a really important paper.

“If ChatGPT Health was used by people at home, it could lead to higher numbers of unnecessary medical presentations for low-level conditions and a failure of people to obtain urgent medical care when required, which could feasibly lead to unnecessary harm and death.”

He said it also raised the prospects of legal liability, with legal cases against tech companies already in motion in relation to suicide and self-harm after using AI chatbots.

“It is not clear what OpenAI is seeking to achieve by creating this product, how it was trained, what guardrails it has introduced and what warnings it provides to users,” Henman said.

“Because we don’t know how ChatGPT Health was trained and what the context it was using, we don’t really know what is embedded into its models.”

联系我们 contact @ memedata.com