人工智能、维基百科以及未校对机器翻译中的弱势语言
AI, Wikipedia, and uncorrected machine translations of vulnerable languages

原始链接: https://www.technologyreview.com/2025/09/25/1124005/ai-wikipedia-vulnerable-languages-doom-spiral/

人工智能翻译工具对维基百科上少数民族语言的复兴构成重大威胁。虽然本意是扩大访问范围,但缺乏经验的用户正在向伊博语和夏威夷语维基百科等平台倾倒不准确的、人工智能生成的内容。这种“机器翻译”无助于语言学习,反而会产生难以理解的文本,并延续错误,这些错误随后被*回收*到在线销售的有缺陷的学习材料中——甚至短语手册也充斥着“完全无意义的内容”。 语言学家和语言活动家担心这会破坏数十年来保护濒危语言的工作,可能令学习者望而却步,并向公众错误地呈现该语言。维基媒体基金会将内容质量的责任放在各个语言社区身上,但许多社区缺乏必要的积极参与来纠正错误。随着语言迅速消失,人们担心不受控制的人工智能翻译可能会加速语言丧失,制造一个数字雷区,并最终阻碍复兴工作。

## AI、维基百科与弱势语言:摘要 一篇近期文章强调了维基百科内容被人工夸大的问题,尤其是在鲜为人知的语言中。问题源于未修正的机器翻译,以及在某些情况下,出于善意但缺乏母语水平的编辑贡献。例如,一位青少年主要负责苏格兰盖尔语维基百科,以及宿雾语和格陵兰语维基百科中类似的问题。 核心问题在于,有缺陷的AI生成内容会延续不准确性,尤其是在被回收用作来源材料时。虽然维基百科的来源政策*旨在*防止这种情况(优先考虑原始来源),但当文章引用维基百科本身而未经核实时,这一循环就会中断。 讨论线程显示了关于语言保护与实用性的争论。一些人认为,允许语言演变甚至消亡是自然的,而另一些人则强调文化损失。 也有批评指向谷歌等科技公司,因为它们提供了未经检查的翻译,以及LLM创建者因为利润动机而提供的有缺陷的训练数据。最终,这场对话指出了在快速变化的数字环境中维护质量和准确性的挑战。
相关文章

原文

Iwuala, who now works as a professional translator between English and Igbo, said the users doing the most damage are inexperienced and see AI translations as a way to quickly increase the profile of the Igbo Wikipedia. She often finds herself having to explain at online edit-a-thons she organizes, or over email to various error-prone editors, that the results can be the exact opposite, pushing users away: “You will be discouraged and you will no longer want to visit this place. You will just abandon it and go back to the English Wikipedia.”  

These fears are echoed by Noah Ha‘alilio Solomon, an assistant professor of Hawaiian language at the University of Hawai‘i. He reports that some 35% of words on some pages in the Hawaiian Wikipedia are incomprehensible. “If this is the Hawaiian that is going to exist online, then it will do more harm than anything else,” he says. 

Hawaiian, which was teetering on the verge of extinction several decades ago, has been undergoing a recovery effort led by Indigenous activists and academics. Seeing such poor Hawaiian on such a widely used platform as Wikipedia is upsetting to Ha‘alilio Solomon. 

“It is painful, because it reminds us of all the times that our culture and language has been appropriated,” he says. “We have been fighting tooth and nail in an uphill climb for language revitalization. There is nothing easy about that, and this can add extra impediments. People are going to think that this is an accurate representation of the Hawaiian language.” 

The consequences of all these Wikipedia errors can quickly become clear. AI translators that have undoubtedly ingested these pages in their training data are now assisting in the production, for instance, of error-strewn AI-generated books aimed at learners of languages as diverse as Inuktitut and Cree, Indigenous languages spoken in Canada, and Manx, a small Celtic language spoken on the Isle of Man. Many of these have been popping up for sale on Amazon. “It was just complete nonsense,” says Richard Compton, a linguist at the University of Quebec in Montreal, of a volume he reviewed that had purported to be an introductory phrasebook for Inuktitut. 

Rather than making minority languages more accessible, AI is now creating an ever expanding minefield for students and speakers of those languages to navigate. “It is a slap in the face,” Compton says. He worries that younger generations in Canada, hoping to learn languages in communities that have fought uphill battles against discrimination to pass on their heritage, might turn to online tools such as ChatGPT or phrasebooks on Amazon and simply make matters worse. “It is fraud,” he says.

A race against time

According to UNESCO, a language is declared extinct every two weeks. But whether the Wikimedia Foundation, which runs Wikipedia, has an obligation to the languages used on its platform is an open question. When I spoke to Runa Bhattacharjee, a senior director at the foundation, she said that it was up to the individual communities to make decisions about what content they wanted to exist on their Wikipedia. “Ultimately, the responsibility really lies with the community to see that there is no vandalism or unwanted activity, whether through machine translation or other means,” she said. Usually, Bhattacharjee added, editions were considered for closure only if a specific complaint was raised about them. 

But if there is no active community, how can an edition be fixed or even have a complaint raised? 

联系我们 contact @ memedata.com