Most ‘significant’ results occur on the first try
Leif Nelson has a fascinating blog on the NHST method and statistical significance and the chance of a false positive. The question can be posed in the following way: Suppose 100 labs begin the same bad study, i.e., a study involving variables that in fact have no effect. Once a lab gets a “hit”, it stops trying. If the chosen significance level is p (commonly p = 0.05), then approximately 5 of the 100 labs will, by chance, get a “hit”, a significant result, on the first try. If the remaining 95 labs attempt to replicate, again a fraction between 4 and 5 will “hit” – and so on. So, the number of ‘hits’ is a declining (exponential) function of the number of trials – even though the chance of a hit is constant, trial-by-trial.
The reason for the trial-by-trial decline, of course, is that every lab has an opportunity for a hit on trial 1, but a smaller number, 1-p = 0.95, has a chance at a second trial, and so on. The ratio of hit probability per opportunity remains constant, p. The average number of trials per hit is 1/p = 20 in this case. But the modal number is just one, because the opportunity is maximal on the first trial.
On the other hand, the more trials are carried out, the more likely that there will be a ‘hit’ – this even though the maximum number (but not probability) of hits is on the first trial. To see this, imagine running the hundred experiments for, say 10 repeats each. The probability of non-significance on trial 1 is 1-0.05 = 0.95, on trial 2, (1-p), on trial 3 (1-p)2 and so on. These trials are independent, so the probability of failure, no ‘hit’ from trials 1 through N is obviously (1-p)N. The probability of success, a ‘hit’ somewhere from trial 1 to trial N is obviously the complement of that:
P(‘hit’|N) = 1-(1-p)N,
Which is an increasing, not a decreasing function of N. In other words, even though, most false positives occur on the first trial (because opportunities are then at a maximum), it is also true that the more trials are run, the more likely one of them will be a false positive.
But Leif Nelson is undoubtedly correct that it is those 5% that turned up ‘heads’ on the very first try that are so persuasive, both to the researcher who gets the result and the reviewer who judges it.