大型可重复性项目未能验证生物医学研究。

大型可重复性项目未能验证生物医学研究。
Reproducibility project fails to validate dozens of biomedical studies

原始链接: https://www.nature.com/articles/d41586-025-01266-x

一项名为“巴西可重复性倡议”的大型研究，汇集了50多个研究团队，调查了巴西生物医学研究中三种常用研究方法的可重复性。其预印本结果显示，不到一半的受测实验能够被复制。该倡议旨在根据方法论评估出版物，而非影响力或引用次数。该项目涉及56个实验室的213名科学家，在COVID-19大流行期间面临后勤挑战以及对协议的不同解读。研究人员选择了三种常用方法，审查了巴西关联团队1998年至2017年发表的论文，并复制了47篇论文中的实验。可重复性通过多个标准进行判断，包括统计显著性和效应方向。只有21%的实验至少符合一半的标准。该研究还发现，原始论文中的效应大小显著大于复制尝试中的效应大小，表明原始出版物中结果被高估了。作者希望这项研究能够通过政策变化和大学倡议推动巴西科学的改进。

Hacker News 上的一篇帖子讨论了一个未能验证众多生物医学研究的可重复性项目。评论者们就失败的重复实验的意义展开了辩论，其中关注到一些研究建立在可能存在缺陷甚至伪造的数据之上，例如阿尔茨海默病的研究。一些人提到了他们自己领域的可重复性问题，尤其是在计算机科学领域，尽管理论上可以实现100%的复制。提高可重复性的建议包括：预注册研究、强制使用数据分析版本控制、激励复制工作以及创建科学家之间共享论文评估的私人信任网络。一位评论者建议博士候选人复制现有研究而不是进行原创性工作。其他人则讨论了发表论文的压力以及同行评审的局限性，认为科学方法依赖于自我纠正，即使有些论文存在缺陷。讨论中涉及当前的可重复性水平是否需要取消研究经费的问题，反驳的观点则强调了科学取得的巨大进步。一些人警告不要将失败的复制等同于欺诈，强调研究中固有的困难。

原文

Two female researchers wearing full PPE sit working at extraction units in the lab, with their faces reflected in the glass — A replication drive focused on results that lean on three methods commonly used in biomedical research in Brazil. Credit: Mauro Pimentel/AFP/Getty

In an unprecedented effort, a coalition of more than 50 research teams has surveyed a swathe of Brazilian biomedical studies to double-check their findings — with dismaying results.

The teams were able to replicate the results of less than half of the tested experiments¹. That rate is in keeping with that found by other large-scale attempts to reproduce scientific findings. But the latest work is unique in focusing on papers that use specific methods and in examining the research output of a specific country, according to the research teams.

The results provide an impetus to strengthen the country’s science, the study’s authors say. “We now have the material to start making changes from within — whether through public policies or within universities,” says Mariana Boechat de Abreu, a metascience researcher at the Federal University of Rio de Janeiro (UFRJ) in Brazil and one of the coordinators of the project.

The work was posted on 8 April to the bioRxiv preprint server and has not yet been peer reviewed.

Ambitious undertaking

The massive experiment was coordinated by the Brazilian Reproducibility Initiative, a collaborative effort launched in 2019 by researchers at the UFRJ. The scientists wanted to assess publications “based on methods, rather than research area, perceived importance or citation counts”, de Abreu says. And they wanted to do so on a large scale. Ultimately, 213 scientists at 56 laboratories in Brazil were involved in the work.

The project unfolded during the COVID-19 pandemic, which brought numerous logistical challenges. And teams disagreed about how closely to follow the tested protocols. “It was like trying to turn dozens of garage bands, each with its own way of playing, into an orchestra,” says project coordinator Olavo Bohrer Amaral, a physician at the UFRJ.

Reproducibility trial: 246 biologists get different results from same data sets

The authors began by reviewing a random sample of life-sciences articles to determine the most common biomedical research methods used in Brazil, ensuring that any biomedical lab interested in joining the project would be capable of reproducing the experiments.

They ended up selecting three of these methods: an assay of cell metabolism, a technique for amplifying genetic material and a type of maze test for rodents. Then the authors randomly selected biomedical papers that relied on those methods and were published from 1998 to 2017 by research teams in which at least half the contributors had a Brazilian affiliation.

The collaborators initially chose a subset of 60 papers for replication, guided by factors such as whether a paper included certain statistical information. Three labs tested each experiment, and an independent committee judged which of those tests was a valid replication. The coalition performed 97 valid replication attempts of 47 experiments.

Falling short

The authors judged a paper’s replicability by five criteria, including whether at least half of the replication attempts had statistically significant results in the same direction as the original paper. Only 21% of the experiments were replicable using at least half of the applicable criteria.

The authors also found that the effect size — the magnitude of the observed impact in the experiments — was, on average, 60% larger in the original papers than in the experimental follow-ups, indicating that published results tend to overestimate the effects of the interventions tested.

联系我们 contact @ memedata.com