3

In recent years there has been a increasing number of complaints that many peer-reviewed scientific publications are of poor quality (for example see these questions: Can up to 70% of scientific studies not be reproduced? , Are most published results relying on measures of statistical significance likely to be false? ).

A recent article in Vox This is why you shouldn’t believe that exciting new medical study makes several more specific allegations about papers relevant to medicine.

For example:

Most medical studies are wrong

...only 3,000 of 50,000 new journal articles published each year are well-designed and relevant enough to inform patient care. That's 6 percent.

That seems like a very specific and alarming claim. Note that the claim isn't that the results are wrong. The claim is that even without looking at the results the very design of the study is too flawed for any result to be useful for patient care.

How well founded is this claim?

matt_black
  • 56,186
  • 16
  • 175
  • 373
  • Veritasium (Youtube channel) has a great video about the peer review process. – ChanganAuto Oct 23 '16 at 14:00
  • 2
    The question assumes a relation between *poor quality peer-reviewed scientific publications* (text) and *medical studies too badly designed to improve patient care?* (title). That relation is not automatic. Please change one or the other. –  Oct 23 '16 at 20:49
  • @JanDoggen My question doesn't *assume* any strong relationship. I simply report the problem of poor quality peer review as background. The point of the question and the specific claim is about badly designed medical studies. A specific claim has been made, I report the more general problem merely as background. This should not impact the question to be answered in any way. – matt_black Oct 23 '16 at 23:34
  • Pure conjecture on my part: Medical studies are probably expensive. If you have an idea that you want to study, it therefore makes sense to test it with a small sample first. If your small test gives a positive result, you may try and find funding to test it with a medium sized sample. If this medium test gives a positive result, you may try and find funding for a full-size test which is big enough to actually inform patient care. I suspect that it's fairly easy to find funding for a small test, difficult for a medium test, and very difficult for a big test. – AndyT Nov 24 '16 at 11:06
  • Hence, although I have zero evidence or research to back this up, I find the "3000 out of 50000" number completely plausible and not surprising. – AndyT Nov 24 '16 at 11:08

1 Answers1

3

The claim is based on this: http://ebn.bmj.com/content/8/2/39.full

bmjupdates+ uses the same, explicit and reproducible quality filters as Evidence-Based Medicine (http://hiru.mcmaster.ca/ebmj/Ebmp_p.htm) and Evidence-Based Nursing. Applying these criteria to each article in over 110 premier clinical journals (about 50 000 articles per year), about 3000 articles (6%) pass muster—that is, have adequate methods to support their conclusions for key aspects of clinical care.

Keep in mind that this is bmjupdates promoting their own service and part of the point of that service is to filter for a few of the best rather than to include everything that might be useful. This is sane and sensible since clinicians cannot read 50K papers per year.

I'm going to withhold judgement on the term "most" meaning 50%+ but I will address the 6% figure.

It is not true that 94% are too badly designed to improve patient care.

To be more specific only 3000 pass these filters, they do not provide a breakdown for why any particular paper failed:

Criteria for Review and Selection for Abstracting:

General

All English-language original and review articles in an issue of a candidate journal are consdiered for abstracting if they concern topics improtant to the clinical practice of internal medicine, general and family practice, surgery, psychiatry, paediatrics, or obstetrics and gynaecology. Access to foreign-language journals is provided through the systematic reviews we abstract, especially those in the Cochrane Library, which summarises articles taken from over 800 journals in several languages.

Prevention or treatment; quality improvement

• Random allocation of participants to interventions

• Outcome measures of known or probable clinical importance for ³ 80% of the participants who entered the investigation.

Diagnosis

• Inclusion of a spectrum of participants, some (but not all) of whom have the disorder of derangement of interest

• Each participant must receive the new test and the diagnostic standard test

• Either an objective diagnostic standard or a contemporary clinical diagnostic standard with demonstrably reproducible criteria for any subjectively interpreted component

• Interpretation of the test without knowledge of the diagnostic standard result

• Interpretation of the diagnostic standard without knowledge of the test result.

Prognosis

• An inception cohort of persons, all initially free of the outcome of interest

• Follow-up of ³ 80% of patients until the occurrence of either a major study end point or the end of the study.

Causation

• Observations concerning the relation between exposures and putative clinical outcome

• Prospective data collection with clearly identified comparison group(s) for those at risk for the outcome of interest (in descending order of preference from randomised controlled trials, quasi-randomised controlled trials, nonrandomised controlled trials, cohort studies with case by case matching or statistical adjustment to create comparable groups, to nested case control studies)

• Masking of observers of outcomes to exposures (this criterion is assumed to be met if the outcome is objective).

Economics of health care programmes or intervention

• The economic question must compare alternative courses of action in real or hypothetical patients

• The alternative diagnostic or therapeutic services or quality improvement strategies must be compared on the basis of both the outcomes they produce (effectiveness) and the resources they consume (costs)

• Evidence of effectiveness must come from a study (or studies) that meets criteria for diagnosis, treatment, quality assurance, or review articles

• Results should be presented in terms of the incremental or additional costs and outcomes incurred and a sensitivity analysis should be done.

Clinical prediction guides

• The guide must be generated in 1 set of patients (training set) and validated in an independent set of real not hypothetical patients (test set), and must pertain to treatment, diagnosis, prognosis, or causation.

Differential diagnosis

• A cohort of patients who present with a similar, initially undiagnosed but reproducibly defined clinical problem

• Clinical setting is explicitly described

• Ascertainment of diagnosis for 80% of patients using a reproducible diagnostic workup strategy and follow up until patients are diagnosed or follow up of 1 month for acute disorders or

• ³1 year for chronic or relapsing disorders.

Systematic reviews

• The clinical topic being reviewed must be clearly stated; there must be a description of how the evidence on this topic was tracked down, from what sources, and with what inclusion and exclusion criteria

• ³1 article included in the review must meet the above-noted criteria for treatment, diagnosis, prognosis, causation, quality improvement, or the economics of health care programmes.

Source: Purpose and procedure

That does not mean that every paper which fails to pass these filters is terrible or useless.

A paper may still be informative or useful but they may be harder to parse, harder to compare to each other results or the subject being studied may render it impossible to meet the criteria above.

For example a paper which is simply a case study of an individual patient with a rare form of brain damage may be quite informative to neurologists but there's no way you could ethically randomly brain damage a cohort of people.

These filters simply allow us to find the ones which are the most informative and most easily used in combination with other data. They might be described as the most systematically useful but they're not the only useful ones.

Murphy
  • 9,486
  • 1
  • 47
  • 45
  • You haven't provided a link that proves that 3000 or more pass the stated critera. This only provides standards that a group claims to use, not how many pass this criteria or rather we should consider the group ranking them trustworthy. Further while mentioning other types of articles that could be useful without passing the criteria is useful without an indication to how many, or if, these types of articles are published we don't know of the types of articles you mention are a non-negligible number of total articles published. – dsollen Nov 23 '16 at 19:13