14

This article, linked to by Glen, claims most published results are false. The argument is based on an analysis of how papers relying on statistical significance analyse and report their statistical results and is bolstered by related arguments about publication bias.

An intuitive way to understand the possibility of publication bias is to note that insignificant results (e.g. this drug is no better than placebo) don't make great publications but apparently significant results (like MMR vaccine causes autism) will get you headlines in the popular press long after other evidence has refuted your conclusion. And trawling random data for significant results will throw up many results with p-values significant at the 5% level. Publishing such results while ignoring the insignificant results of the trawl is a poor way to determine truth but a good way to get publication kudos. Some modern authors believe this is endemic not just in social science but in medical epidemiology.

So it is important to know: are most published results likely to be wrong?

matt_black
  • 56,186
  • 16
  • 175
  • 373
Casebash
  • 10,199
  • 5
  • 55
  • 84
  • 2
    the article you link to is talking about a specific type of published result, i.e. the existence of a treatment effect in data with a certain confidence level. I recommend making your question more specific. also, see http://www.amazon.com/Cult-Statistical-Significance-Economics-Cognition/dp/0472050079 for a thorough examination f confidence intervals, p values, and being wrong. – justin cress Apr 10 '11 at 13:49
  • Yes, that accounts for the continual backward movement of science and technology. In a few years time we will falsify ourselves back to the stone age. ;) However, p = 0.05 has always been a very dull (Occam's) razor. You get results like that, you need to figure out how to better your tests. Well, that or use your statistical roulette to market multibillion dollar pharmaceuticals with no better efficacy than placebo. –  Apr 10 '11 at 14:01
  • Quite possibly a lot of what's published is incorrect, either deliberately or through errors. Such is the way of science (today, as in the past and no doubt into the future). If scientific progress is made, those false ideas get rejected and replaced with other, better if not necessarilly complete, ones over time. – jwenting Apr 10 '11 at 14:57
  • 1
    @jwenting: Some of what's published is incorrect, not due to errors or deliberate action. Approximately 5% of tests where there is no effect at all will produce results significant at the 5% level. – David Thornley Apr 11 '11 at 00:00
  • @Wayfaring Stranger: In some fields, they'd laugh at you if you tried to present a result significant at the 5% level. In other fields, you're dealing with a lot of randomness and essentially taking what experimental subjects you can get, and 5% is a reasonable goal. – David Thornley Apr 11 '11 at 00:02
  • 2
    @David: that's still errors if the result is incorrect. It's data errors, not errors in the analysis of that data, but still errors. The reverse can also happen: correct results from incorrect data. Still error, but you don't realise it. I've dealt with that myself during my thesis work in nuclear physics. A formula used was incorrect, but yielded results that were, within the margin of error applied, correct. Once we got that margin down by several orders of magnitude by improving our algorithms and data, the error became apparent. – jwenting Apr 11 '11 at 06:15
  • Just a note: work is done to combat the publication bias as there are clinical trial registries in many countries. In essence the idea is that no results of a trial will be published unless it has been registered prior to doing the study. – Illotus Jan 08 '12 at 15:34
  • I'd love to know about the reason for downvotes: this is an important question and I'd like to make sure we ask it in the clearest and most useful way. So improve it rather than downvoting it or at least give your reasons. Please. – matt_black Jan 11 '12 at 10:58

2 Answers2

6

The "most epidemiology studies are wrong", perhaps best advanced by John P. A. Ioannidis in Why Most Published Research Findings Are False both extremely common and somewhat flawed, though there are absolutely some worthwhile points in his article.

Perhaps the one that amuses me the most is how readily it's been embraced conceptually, even though its argument is that...perhaps it shouldn't be.

Goodman and Greenland outline some problems with the paper here: http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0040168

Mainly, the flaw is in how Ioannidis structures his argument, such that the odds are stacked against certain types of studies, and the evidentiary value of effect measures and actual p-values, rather than just a binary "Is/is not >0.05", are discounted.

There are other examples of supposedly "wrong" findings later disproved (this usually goes in the form of an observational epidemiology study finding one thing, and a subsequent RCT finding something different) that can be traced back to the studies asking different things. An examplar paper might be: Hernán MA, Alonso A, Logan R, Grodstein F, Michels KB, Willett WC, Manson JE, Robins JM. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease (with discussion). Epidemiology 2008; 19:766-779.

What I take from all this is the following:

  1. Significance testing in and of itself is not proof. There's far more value in providing the p-value, and even more value in providing an actual estimate of effect.
  2. No study can provide singular, definitive proof of something. Not an observational study, and not an RCT (which are far from free of bias). Nor should any study claim to.
  3. The question of "wrong" is an odd one in science. A more useful question would be "are most published results likely to advance their field?". Because a wrong study that provides fodder for further analysis, new methods and more thoughtful science is still profoundly useful.

EDIT. There is an additional way to directly estimate the scale of the problem: try to reproduce reported research results. Though there are more reasons than the nature of statistical fluctuations to explain failure to reproduce results, the answers give an order of magnitude for the reliability of published research and Ioannidis' estimate. The botom line of such studies is that something like 2 out of 3 attempts to reproduce previous research fail to do so. For a more detailed summary see the answers here: Can up to 70% of scientific studies not be reproduced?

Fomite
  • 255
  • 2
  • 5
  • +1 for a good overview in totum, but particularly for point #2. – Dave May 16 '12 at 15:24
  • @EpiGrad I hope you don't mind the edit. I thought the addition of a reference to actual attempts to reproduce studies fitted with your answer better than with a new answer. – matt_black Nov 06 '12 at 16:56
  • @matt_black Work for me, thanks for the improvement. – Fomite Nov 06 '12 at 22:45
2

I made that point referring to this post in which I question the conclusion of spanking causing aggressive behavior: Is spanking an effective form of discipline for children?.

There is a publication bias, especially in the social sciences (can't publish insignificant effects). When you can run controlled experiments, as in chemistry, physics, or clinical trials, the results are much more likely to be true. However in observational studies the tempetation is to go from association to causation, which in repeated studies the effect probably does not hold.

More on the subject:

http://www.johndcook.com/blog/2011/01/17/scientific-results-fading-over-time/

Glen
  • 137
  • 2
  • Well, you can't easily run and publish replicable experiments on *spanking children*. Replicable experimental research is published in some of the social sciences. – Paul Apr 11 '11 at 06:53
  • 5
    This answer provides a bunch of speculation and no evidence to back up the claims. It doesn't even answer the question. What does "more likely to be true" mean? Can you generalize Ioannidis paper to the whole field? – Christian Apr 12 '11 at 20:21