利用越狱人工智能模型诈骗老年受害者

利用越狱人工智能模型诈骗老年受害者
Measuring the impact of AI scams on the elderly

原始链接: https://simonlermen.substack.com/p/can-ai-models-be-jailbroken-to-phish

最近的研究，与路透社合作并在arXiv上发表，表明人工智能驱动的诈骗对老年人的真实威胁。研究人员测试了领先的人工智能模型（ChatGPT、Claude、Gemini和Meta的系统）的“越狱”漏洞——诱导它们生成网络钓鱼邮件。一项涉及108名老年参与者的研究表明，11%的人至少被一封人工智能生成的邮件成功钓鱼，其中最有效的邮件点击率达到9%。虽然简单的越狱方法对Meta和Gemini有效，但ChatGPT和Claude表现出更强的抵抗力。路透社的调查，包括对东南亚“诈骗工厂”的报道，这些工厂强迫受害者使用ChatGPT等人工智能进行欺诈，突出了人工智能辅助诈骗日益严重的规模。这项研究通过评估“越狱”人工智能造成的*实际损害*，而不仅仅是“越狱”本身，填补了一个关键的空白。研究结果影响重大，凯利参议员已援引该研究，要求参议院就人工智能对老年美国人的影响举行听证会，并且关于人工智能驱动的语音诈骗的进一步研究正在进行中。

## 人工智能诈骗与老年人：日益增长的威胁一项最新研究调查了人工智能生成的网络钓鱼邮件对老年人的影响，发现108名参与者中有11%点击了恶意链接。这项研究源于有关人工智能被用于诈骗活动的报告，强调了老年人容易受到日益复杂网络欺诈的侵害。 Hacker News上的讨论显示，即使意识到人工智能的能力，老年人仍然会反复上当受骗，通常是因为诱人的“促销”或紧急安全警报。一些评论员分享了家人受骗的亲身经历，导致固定收入和信托基金损失。虽然生成网络钓鱼邮件并非新事，但人工智能可以实现个性化的大规模攻击。专家指出，像Gemini这样的人工智能工具能够轻松创建令人信服的诈骗手段令人担忧。该研究强调了加强教育和保护措施的必要性，一些人建议使用浏览器扩展程序和电子邮件过滤器来识别和阻止欺诈信息。最终，这场讨论指向了一个令人担忧的趋势，即人工智能正在放大现有的诈骗手段，利用弱势群体。

原文

TLDR: We worked with Reuters on an article and just released a paper on the impacts of AI scams on elderly people.

Fred Heiding and I have been working for multiple years on studying how AI systems can be used for fraud or scams online. A few months ago, we got into contact with Steve Stecklow, a journalist at Reuters. We wanted to do a report on how scammers use AI to target people with a focus on elderly people. There have been many individual stories about how elderly people were frequently the victims of scams and how AI made that situation worse.

With Steve, we performed a simple study. We contacted two senior organizations in California and signed up some of the people. We tried different methods to jailbreak different frontier systems and had them generate phishing messages. We sent those generated phishing emails to actual elderly participants who had willingly signed up to participate in the study.

The outcome was that 11% of the 108 participants were phished by at least one email, with the best performing email getting about 9% of people to click on the embedded URL. Participants received between 1 to 3 messages. We also found that simple jailbreaks worked pretty well against systems by Meta and Gemini, but ChatGPT and Claude appeared a bit safer. The full investigation was published as a Reuters special report.

The journalists we worked with also explored how scammers use AI systems in the wild and they interviewed people that had been abducted into scam factories in Southeast Asia. This reporting was handled by another Reuters journalist, Poppy McPherson. These abducted victims of organized crime groups were coerced into scamming people. They had been given promises of high-paying jobs in Southeast Asia, were flown out to Thailand, had their passports taken, and were forced to live in these scam factories. These people confirmed that they used AI systems such as ChatGPT to scam people in the United States.

We tried to fill an existing gap between jailbreaking studies and people trying to understand the impacts of AI misuse. The gap is that few are doing this end-to-end evaluation - going from jailbreaking the model to evaluating the harm that the jailbreak outputs could actually do. AI can now automate much larger parts of the scam and phishing infrastructure. We do have a talk about this where Fred talks about what’s possible at the moment, particularly regarding infrastructure automation with AI for phishing.

We have recently worked on voice scams and hopefully will have a study on this reasonably soon. Fred gave a talk mentioning this here. The article by Reuters was mentioned in some podcasts and received discussion online.

Most significantly, our research was cited by Senator Kelly in a formal request for a Senate hearing to examine the impact of AI chatbots and companions on older Americans, helping to motivate that hearing.

We have now published our results in a paper available on arXiv. It has been accepted at the AI Governance Workshop at the AAAI conference. Though there are some limitations to our study, we think that it is valuable to publish this end-to-end evaluation in the form of a paper. Human studies on the impacts of AI are still rare.

This research was supported by funding from Manifund, recommended by Neel Nanda.

利用越狱人工智能模型诈骗老年受害者 Measuring the impact of AI scams on the elderly

利用越狱人工智能模型诈骗老年受害者
Measuring the impact of AI scams on the elderly