(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=43496644

Hacker News 上的一篇讨论围绕一篇文章展开,该文章强调 AI 模型在诊断黑人和女性患者疾病时的准确性较低。评论员将此归因于训练数据中的偏差,反映了医疗研究和治疗中长期存在的不平等现象。微软的 MedFuzz 研究被引用,该研究表明误导性提示如何歪曲大型语言模型的诊断,尤其是在加入种族或社会经济因素时。 减轻偏差的建议包括将种族和性别作为模型输入,使用多样化和包容性的训练数据(DIET),以及专门化模型。一些人认为,即使是不完美的 AI 也能提高某些人群的疾病检测率,主张谨慎部署和验证;而另一些人则警告不要部署可能损害患者的缺陷技术。一个关键问题是如何在 AI 系统对不同人口群体表现不同时,以道德的方式使用它们,平衡潜在的好处和加剧差异的风险。

相关文章
  • 人工智能模型漏诊黑人和女性患者的疾病 2025-03-27
  • (评论) 2024-02-23
  • (评论) 2023-12-09
  • (评论) 2025-02-25
  • (评论) 2025-03-23

  • 原文
    Hacker News new | past | comments | ask | show | jobs | submit login
    AI models miss disease in Black and female patients (science.org)
    53 points by pseudolus 51 minutes ago | hide | past | favorite | 24 comments










    "AIs want the future to be like the past, and AIs make the future like the past. If the training data is full of human bias, then the predictions will also be full of human bias, and then the outcomes will be full of human bias, and when those outcomes are copraphagically fed back into the training data, you get new, highly concentrated human/machine bias.”

    https://pluralistic.net/2025/03/18/asbestos-in-the-walls/#go...



    I came across a fascinating Microsoft research paper on MedFuzz (https://www.microsoft.com/en-us/research/blog/medfuzz-explor...) that explores how adding extra, misleading prompt details can cause large language models (LLMs) to arrive at incorrect answers.

    For example, a standard MedQA question describes a 6-year-old African American boy with sickle cell disease. Normally, the straightforward details (e.g., jaundice, bone pain, lab results) lead to “Sickle cell disease” as the correct diagnosis. However, under MedFuzz, an “attacker” LLM repeatedly modifies the question—adding information like low-income status, a sibling with alpha-thalassemia, or the use of herbal remedies—none of which should change the actual diagnosis. These additional, misleading hints can trick the “target” LLM into choosing the wrong answer. The paper highlights how real-world complexities and stereotypes can significantly reduce an LLM’s performance, even if it initially scores well on a standard benchmark.

    Disclaimer: I work in Medical AI and co-founded the AI Health Institute (https://aihealthinstitute.org/).



    Race and gender should be inputs then.

    The female part is actually a bit more surprising. Its easy to imagine a dataset not skewed towards black people. ~15% of the population in North America, probably less in Europe, and way less in Asia. But female? Thats ~52% globally.



    Surprising? That's not a new realisation. It's a well known fact that women are affected by this in medicine. You can do a cursory search for the gender gap in medicine and get an endless amount of reporting on that topic.


    Modern medicine has long operated under the assumption that whatever makes sense in a male body also makes sense in a female body, and womens' health concerns were often dismissed, misdiagnosed or misunderstood in patriarchal society. Women were rarely even included in medical trials prior to 1993. As a result, there is simply a dearth of medical research directly relevant to women for models to even train on.


    What's so striking is how strongly race shows in X-rays. That's unexpected.


    But is it really?


    Just like doctors: https://kffhealthnews.org/news/article/medical-misdiagnosis-...

    I wonder how well it does with folks that have chronic conditions like type 1 diabetes as a population.

    Maybe part of the problem is that we're treating these tools like humans that have to look at one fuzzy picture to figure things out. A 'multi-modal' model that can integrate inputs like raw ultrasound doppler, x-ray, ct scan, blood work, ekg, etc etc would likely be much more capable than a human counterpart.



    It seems critical to have diverse, inclusive, and equitable data for model training. (I call this concept "DIET".)


    Or take more inputs. If there are differences between race and gender and thats not captured as an input we should expect the accuracy to be lower.

    If an x-ray means different things based off the race or gender we should make sure the model knows the race and gender.



    Funny you should say that. There was a push to have more officially collected DIET data for exactly this reason. Unfortunately such efforts were recently terminated.


    Just as good as a real doctor then?


    Humans do the same. Everything from medical studies to doctor trainings treat the straight white man as the "default human" and this obviously leads to all sorts of issues. Caroline Criado-Perez has an entire chapter about this in her book about systemic bias Invisible Women, with a scary number of examples and real world consequences.

    It's no surprise that AI training sets reflect this also. People have been warning against it [0] specifically for at least 5 years.

    0: https://www.pnas.org/doi/10.1073/pnas.1919012117

    Edit: I've never had a comment so heavily downvoted so quickly. I know it's not the done thing to complain but HN really feels more and more like a boys club sometimes. Could anyone explain what they find so contentieus about what I've said?



    Everybody knows that gay men have more livers and fewer kidneys than straight men


    Why the snark? The OP, the study I linked and the book I referenced which contains many well researched examples of issues caused by defaultism surely represent a strong enough body of work that they should deserve a more engaged critique.


    AI does a terrible job huh? I wish there was a way we have done this for decades without that problem...


    just giving globs of training sets and letting a process cook for a few months is just going to be seen as lazy in the near future

    more specialization of models is necessary, now that there is awareness



    This seems like a problem that should be worked on

    It also seems like we shouldn't let it prevent all AI deployment in the interim. It is better that we take the disease detection rate for part of the population up a few percent than we do not. Plus it's not like doctors or radiologists always diagnose at perfectly equal accuracy across all populations.

    Let's not let the perfect become the enemy of the good.



    False positive diagnoses cause a huge amount of patient harm. New technologies should only be deployed on a widespread basis when they are justified based on solid evidence-based medicine criteria.


    No one says you have to use the AI models stupidly.

    If it works poorly for black women and female women dont use it for them.

    Also, dont use it for the initial diagnosis. Use it after the normal diagnosis process as more of a validation step.

    Anyways, this all points to the need to capture biological information as input or even having seperately models tuned to different factors.



    Mmmm...

    You don't work in healthcare do you?

    I think it would be extremely bad if people found out that, um, "other already disliked/scapegoated people", get actual doctors and nurses working on them, but "people like me" only get the doctor or nurse checking an AI model.

    I'm saying that if you were going to do that, you'd better have an extremely high degree of secrecy about what you were doing in the background. There's a bajillion ways that could go sideways real fast. Especially if that model performs worse than some rockstar doctor that's now freed up to take his/her time seeing the, uh, "other already disliked/scapegoated population".



    Are you black or female?


    This is obviously a jab, but to answer the unspoken question: if it's still more effective than human doctors for black or female people, then yes it should be used. If it isn't, then don't use it for them. Simple as that. (fixing this problem should be a high priority either way, that should go without saying)


    Just opt out blacks and females no?






    Join us for AI Startup School this June 16-17 in San Francisco!


    Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



    Search:
    联系我们 contact @ memedata.com