人工智能工具正在发现研究论文中的错误。
AI tools are spotting errors in research papers

原始链接: https://www.nature.com/articles/d41586-025-00648-5

两个AI项目,“黑铲子计划”(Black Spatula Project)和YesNoError,正在利用大型语言模型检测研究论文中的错误,包括计算、方法和参考文献方面的错误。这两个项目受到一起案例的启发,该案例中AI本可以迅速发现一篇关于黑色塑料餐具阻燃剂研究中的一个关键数学错误,旨在主动识别错误,防止其发表。 “黑铲子计划”是一个开源项目,已分析了大约500篇论文,并直接联系作者指出潜在的错误。YesNoError由加密货币资助,已分析了超过37000篇论文,并标记了可能存在缺陷的论文。 虽然这些项目获得了研究诚信专家的初步支持,但也存在误报和损害声誉的担忧。研究人员强调了验证主张并将AI作为工具来对论文进行进一步审查的重要性。这些项目希望被研究人员和期刊用作最终检查,以防止错误和欺诈进入科学文献。

Hacker News 正在讨论 AI 工具在发现研究论文错误方面的作用。一些人认为它可以通过帮助作者和审稿人发现错误来提高论文质量。另一些人则担心它可能帮助不道德的研究人员创作更可信的虚假论文。 人们担心 AI 可能会产生误报,给研究人员带来不必要的工作负担。值得注意的是,目前 AI 更擅长检测不一致之处和错别字,而不是实际的欺诈或逻辑错误。一些人指出,如果 AI 基于阅读了某个学科的所有论文后自行发现错误,那将“解决科学问题”,这本身就是错误的。 人们还担心某个特定的 AI 工具与加密货币相关联,并且在这种情况下使用 AI 会使其负责叙事,这可能并非最佳选择。然而,人们也乐观地认为,AI 可以作为一种有用的筛选工具,补充人工审核,并通过识别需要进一步人工思考/审核的问题来提高整体研究质量。

原文
A large stack of papers and folders with coloured tabs.

Two new AI tools check for errors in research papers including in the calculations, methodology and references.Credit: Jose A. Bernat Bacete/Getty

Late last year, media outlets worldwide warned that black plastic cooking utensils contained worrying levels of cancer-linked flame retardants. The risk was found to be overhyped – a mathematical error in the underlying research suggested a key chemical exceeded the safe limit when in fact it was ten times lower than the limit. Keen-eyed researchers quickly showed that an artificial intelligence (AI) model could have spotted the error in seconds.

The incident has spurred two projects that use AI to find mistakes in the scientific literature. The Black Spatula Project is an open-source AI tool that has so far analysed around 500 papers for errors. The group, which has around eight active developers and hundreds of volunteer advisers, hasn’t made the errors public yet; instead, it is approaching the affected authors directly, says Joaquin Gulloso, an independent AI researcher based in Cartagena, Colombia, who helps to coordinate the project. “Already, it’s catching many errors,” says Gulloso. “It’s a huge list. It’s just crazy.”

The other effort is called YesNoError and was inspired by the Black Spatula Project, says founder and AI entrepreneur Matt Schlicht. The initiative, funded by its own dedicated cryptocurrency, has set its sights even higher. “I thought, why don’t we go through, like, all of the papers?” says Schlicht. He says that their AI tool has analysed more than 37,000 papers in two months. Its website flags papers in which it has found flaws – many of which have yet to be verified by a human, although Schlicht says that YesNoError has a plan to eventually do so at scale.

Both projects want researchers to use their tools before submitting work to a journal, and journals to use them before they publish, the idea being to avoid mistakes, as well as fraud, making their way into the scientific literature.

The projects have tentative support from academic sleuths who work in research integrity. But there are also concerns over the potential risks. How well the tools can spot mistakes, and whether their claims have been verified, must be made clear, says Michèle Nuijten, a researcher in metascience at Tilburg University in the Netherlands. “If you start pointing fingers at people and then it turns out that there was no mistake, there might be reputational damage,” she says.

Others add that although there are risks and the projects need to be cautious about what they claim, the goal is the right one. It is much easier to churn out shoddy papers than it is to retract them, says James Heathers, a forensic metascientist at Linnaeus University in Växjö, Sweden. As a first step, AI could be used to triage papers for further scrutiny, says Heathers, who has acted as a consultant for the Black Spatula Project. “It’s early days, but I’m supportive” of the initiatives, he adds.

AI sleuths

Many researchers have dedicated their careers to spotting integrity concerns in papers – and tools to check certain facets of papers already exist. But advocates hope that AI could carry out a wider range of checks in a single shot and handle a larger volume of papers.

Both the Black Spatula Project and YesNoError use large language models (LLMs) to spot a range of errors in papers, including ones of fact as well as in calculations, methodology and referencing.

The systems first extract information, including tables and images, from the papers. They then craft a set of complex instructions, known as a prompt, which tells a ‘reasoning’ model — a specialist type of LLM — what it is looking at and what kinds of error to hunt for. The model might analyse a paper multiple times, either scanning for different types of error each time, or to cross-check results. The cost of analysing each paper ranges from 15 cents to a few dollars, depending on the length of the paper and the series of prompts used.

The rate of false positives, instances when the AI claims an error where there is none, is a major hurdle. Currently, the Black Spatula Project’s system is wrong about an error around 10% of the time, says Gulloso. Each alleged error must be checked with experts in the subject, and finding them is the project’s greatest bottleneck, says Steve Newman, the software engineer and entrepreneur who founded the Black Spatula Project.

So far, Schlicht’s YesNoError team has quantified the false positives in only around 100 mathematical errors that the AI found in an initial batch of 10,000 papers. Of the 90% of authors who responded to Schlicht, all but one agreed that the error detected was valid, he says. Eventually, YesNoError is planning to work with ResearchHub, a platform which pays PhD scientists in cryptocurrency to carry out peer review. When the AI has checked a paper, YesNoError will trigger a request to verify the results, although this has not yet started.

False positives

联系我们 contact @ memedata.com