如何避免 p 值挖掘
How to avoid P hacking

原始链接: https://www.nature.com/articles/d41586-025-01246-1

P值篡改是指操纵数据或分析以达到统计显著性结果(p < 0.05),这往往源于发表论文的压力。常见形式包括: 1. **过早终止:** 一旦出现显著结果就结束实验,在达到计划样本量之前就停止,这会导致数据不具有代表性。 2. **重复实验直到显著:** 反复进行实验,直到找到显著的结果,只选择性地报告成功的尝试。 3. **选择性报告:** 只突出多个测量结果中对自身有利的结果,忽略或淡化其他结果。 4. **数据调整:** 基于对显著性的期望而非科学依据来决定数据的纳入/排除,例如无正当理由地去除异常值(事先未确定合理的去除异常值的标准)。 这些做法会导致错误的结论,并加剧科学界的可重复性危机。为避免P值篡改,研究人员应预先定义样本量,报告所有实验重复和相关结果(阳性和阴性结果),并在分析数据之前制定数据过滤规则。透明和诚实至关重要。

This Hacker News thread discusses an article from Nature.com about avoiding p-hacking, a practice of manipulating data or analysis to achieve statistically significant results. Commenters highlight various forms of p-hacking, including "data dredging," selectively analyzing old datasets, and prematurely stopping experiments. Several posters emphasized that the pressure to publish and secure funding incentivizes p-hacking. Some users argued that transparently reporting the research process, even when it involves exploratory data analysis, is preferable to hiding methodological choices. Also, the discussion touched on the importance of experimental design and the use of appropriate statistical methods, such as multiple comparison corrections and sequential analysis, to ensure the validity of findings. A few commenters also pointed out that preregistration of experiments can mitigate p-hacking by creating transparency. Overall, the discussion underscores the importance of ethical research practices and the need for systemic changes to incentivize reproducible research.

原文
A paper cut image of a multiple bell curve line graph

Credit: MirageC / Getty

It can happen so easily. You’re excited about an experiment, so you sneak an early peek at the data to see if the P value — a measure of statistical significance — has dipped below the threshold of 0.05. Or maybe you’ve tried analysing your results in several different ways, hoping one will give you that significant finding. These temptations are common, especially in the cut-throat world of publish-or-perish academia. But giving in to them can lead to what scientists call P hacking.

P hacking is the practice of tweaking the analysis or data to get a statistically significant result. In other words, you’re fishing for a desirable outcome and reporting only the catches, while ignoring all the times you came up empty. It might get you a publication in the short term, but P hacking contributes to the reproducibility and replicability crisis in science by filling the literature with dubious or unfounded conclusions.

Most researchers don’t set out to cheat, but they could unknowingly make choices that push them towards a significant result. Here are five ways P hacking can slip into your research.

Ending the experiment too early

You might plan to gather 30 samples but find yourself running a quick analysis halfway through, just to see where things stand. If you notice a statistically significant difference after 15 samples, you might be inclined to stop the experiment early — after all, you’ve found what you were looking for.

But stopping an experiment once you find a significant effect but before you reach your predetermined sample size is classic P hacking. It’s like declaring the winner of an election after polling just half the electorate: the result might not be representative of reality. What’s the solution? Decide on the sample size or data-collection process ahead of time and stick to it, no matter how eager you are to see the results.

Running experiments until you get a hit

Another often-unintentional form of P hacking is repeating the experiment or analysis until you obtain a statistically significant result. Imagine you run an experiment and the outcome is insignificant. You try again with a new batch of samples — still nothing. You repeat the study once more, and voila! P < 0.05. Success? Not quite. If you selectively report only the attempt that ‘worked’ and ignore those that didn’t, you’re engaging in P hacking by omission. As any gambler knows, if you roll the dice often enough, eventually you’ll get the result you want by chance alone (not that I’m a gambler). The better approach is to report all the experimental replicates, including those that didn’t work.

Cherry-picking your results

A less benign form of P hacking is selective reporting. Imagine you measure several outcomes or observe your effect at multiple time points — for instance, testing a therapy’s impact on recipients’ blood pressure, cholesterol, weight and blood sugar regularly over an entire month. After analysing the data, you find that only one outcome — say, blood sugar at week 3 — showed a significant improvement. You might be tempted to highlight this one promising result and downplay the rest, or even omit them from your report. This is cherry-picking: by showing only the favourable data and ignoring everything else, you create a biased narrative.

In this example, people might think the therapy worked because it lowered blood sugar at week 3, even though the overall data are not so rosy. Putting these data into the paper’s supplementary material and continuing with the experiment on the basis of this one finding is also a no-no. You should report all relevant results, not just the ones that support the hypothesis. Science progresses faster when we know what doesn’t work, as well as what does.

Tweaking your data

In data analysis, you often have to make judgements about what to include, what to exclude and how to report the data. P hacking can sneak in when those decisions are guided by the desire to achieve significance rather than by scientific reasoning. For example, you might notice an outlier in your data set. Including it in the analysis gives you a P value of 0.08, whereas excluding it brings P down to 0.03. Problem solved? Not quite.

In these cases, it is best practice to go back to the original data or laboratory notes to determine whether experimental conditions could explain this outlier. Perhaps you pipetted double the amount of reagent into your sample, or construction work nearby during the time you were testing that animal affected its behaviour. Researchers can often rationalize their data-filtration decisions, and most of those decisions are warranted. But if the real motive is to turn an insignificant result into a significant one, it crosses into questionable territory. The key is to decide on data-filtering rules before looking at the results. If, for some reason, you have to make a change after data collection, explain that — and say why.

联系我们 contact @ memedata.com