Critical Appraisal - Therapy Articles - Validity

Introduction

We will look first at the study design and outcomes - make sure you identify these before moving on.  

Then look at the "risk of bias" section for a basic coverage of the common validity issues.  

The GRADE method section is optional - another way to look at validity, particularly for a set of studies.  

When you're done with this section, move on, or return to the introduction page.

Study Design and Outcomes

Risk of Bias

This method of assessing validity is used by the Cochrane Collaboration as the preferred way of assessing the validity of studies included in their systematic reviews.

Most worksheets and books emphasize the need for randomization and allocation concealment, but this tool asks questions that highlight the reasons why these techniques are important in designing studies.

The Cochrane Handbook reference with more detail on the Risk of Bias tool.

The questions to consider:

Was the allocation sequence adequately generated? 

Randomization is the best way to "allocate" subjects in your study to the comparison groups.  There are right and wrong ways to conduct randomization, but in general a random number table or computer-generated randomization are the best ways.  Once subjects are randomized, there should remain in that group throughout the study and during the analysis.  This preserves the best chances that the groups are equal in all ways other than the intervention of interest. 

Was allocation adequately concealed? 

This one is frequently difficult to understand.  Allocation concealment is not intervention "blinding".  Instead, point is to ensure that the person recruiting for the study (the person who invites people into the study, applies the inclusion and exclusion criteria and consents them for the study) does not know the group into which the subject will be placed.  In other words, the recruiter is tasked with getting the appropriate pool of patients into the study.  Randomization should happen after that point.  If this is not done properly, a recruiter could choose to not enroll the subject (even though they meet inclusion criteria) because they did not want the subject in a particular group.  Granted, this seems far-fetched, but unconscious bias may also play a role here.  There is empirical evidence (as with the other items in this list) that failure to conceal allocation can introduce bias (and incorrectly overestimate the effect of the intervention).

Was knowledge of the allocated intervention adequately prevented during the study? 

Blinding does this.  It's important that the groups, that were carefully randomized to make them the same - are treated the same throughout the study.  The most important part of this is making sure they don't know which treatment arm they're in.  We use the placebo effect for this purpose, telling all the subjects that they MAY be getting the intervention.  This goes for both active treatments vs. no treatment as well as for two treatments compared to each other.  We use "dummy" interventions to help with blinding - a placebo pill is an example of a dummy.  It must look, taste and smell like the active pill.  If we are comparing an injection and an oral medication, we have to use a "double-dummy" approach - one group gets placebo injection and active pill, and the other group gets active injection and placebo pill.  Researchers may want to "test" the effectiveness of their blinding by asking subjects whether they thought they were on active treatment or control. 

Were incomplete outcome data adequately addressed? 

Incomplete outcome data usually arises from subjects dropping out of the study.  This can cause two problems.  First, the withdrawals and dropouts may be unequal between groups, which could undo the equalization effect of the randomization.   Remember, it's not OK to just switch any dropouts from the active group to the control group - this will result in a non-randomized study (you're are allocating people based on their adherence to the treatment regimen, not randomly).  Second, the dropouts may reduce the overall power of the study to find an effect.  Power depends on the number of subjects in the study, and this is carefully calculated for the study protocol prior to recruitment, so if subjects are lost, it may not be possible to find a difference between groups even if one exists.  Generally, if fewer than 5% of the subjects are lost, there is no problem.  If greater than 20% of subjects are lost, this is frequently fatal to the analysis.  Between 5 and 20% loss, the researchers should explain the possible effects of this loss on the analysis.  

Are reports of the study free of suggestion of selective outcome reporting?

Increasingly, clinical trials are required to register their protocols (the initial designs of their study) on a government web site, so that journal editors and others can examine the original purpose and outcomes the researchers intended to study.  This has been done because of some cases of researchers not publishing data that didn't "work out" for them or for their funders.  Remember that studies are designed to have a primary outcome - which is supposed to be the most important to the researchers, but is also the one on which many of the statistical calculations are based.  If this is not reported for whatever reason, it makes the entire trial reporting suspect. 

Was the study free of other problems that could put it at a high risk of bias?

There are a lot of concepts that fall in this category - ghost writing of the paper, pharmaceutical funding, and unequal treatment of the two groups. The most important are:

Intention to treat analysis - subjects must be analyzed in the groups to which they are randomized.  It is inappropriate to switch a subject from the intervention group to the control group simply because they did not comply with their intervention (medication, surgery, whatever).  The principal problem with this is that it breaks the randomization process - you've now created groups that do not simply differ randomly.  Second, ITT is a "real-world" test of an intervention - if patients won't take the medication because of safety issues, bad side effects, or cost - then it won't matter how well the medication can work.  It is OK to do a per-protocol analysis (subjects analyzed in the group they complied with) if the researcher simultaneously present an intention-to-treat analysis to compare.

Power - You should find a power calculation near the end of the methods section, then assure that the researchers got that number of subjects into their study, and kept them in there (or at least had data about them if they withdrew).  You may want to look at the power analysis yourself: the alpha level is usually set at 0.05 or 0.01, the power (1-beta) level is usually set at 0.8 or 0.9.  The key for clinicians to examine is the "difference" or the "effect size" that the researchers wanted to be able to detect at those levels of alpha and power.  Was it a 10% difference in mortality?  Is that a clinically significant difference if they can detect it? (sure sounds like it would be) - so a clinician's mind when reading these is important.

Adjustment for baseline differences in allocation - there may be differences in the groups in important subject characteristics that are merely a result of randomization.  Authors should acknowledge these and ensure that the results take these into account - possibly by regression analysis or other multivariable analysis. 

GRADE Method

GRADE is an internationally-adopted framework for assessing a body of evidence (multiple studies) that inform a guideline recommendation.  See more about grading evidence. 

Steps of critical appraisal of a therapy study