Critical Appraisal - Therapy Articles - Validity
Introduction
We will look first at the study design and outcomes - make sure you identify these before moving on.
Then look at the "risk of bias" section for a basic coverage of the common validity issues.
The GRADE method section is optional - another way to look at validity, particularly for a set of studies.
When you're done with this section, move on, or return to the introduction page.
Study Design and Outcomes
Comparators for therapy studies may be placebo, no treatment OR alternative treatments of any sort.
A "dummy" is the item used as a placebo. Dummies are particularly important when you're comparing, say, an oral (pill) therapy with an injectable therapy. To control this study, you need an active and a dummy pill, as well as an active and a dummy injection.
There are two ideal types of evidence for Therapy questions: systematic reviews and randomized controlled trials
There may be other types of studies used: prospective or retrospective cohort studies, case-control studies or even case series, but these are all considered low-quality evidence (see below) and need to be done very well and have large effect sizes to be considered useful evidence.
Think about the outcome you are looking for. Is it:
a patient-oriented outcome?
morbidity, mortality, quality of life, etc.,
these are preferred as they can better evaluate for "unintended consequences" of interventions
a disease-oriented outcome?
blood pressures, cholesterol levels, diagnostic testing results
these are intermediate steps to what we really need - the patient-oriented outcomes.
Risk of Bias
This method of assessing validity is used by the Cochrane Collaboration as the preferred way of assessing the validity of studies included in their systematic reviews.
Most worksheets and books emphasize the need for randomization and allocation concealment, but this tool asks questions that highlight the reasons why these techniques are important in designing studies.
The Cochrane Handbook reference with more detail on the Risk of Bias tool.
The questions to consider:
Was the allocation sequence adequately generated?
Randomization is the best way to "allocate" subjects in your study to the comparison groups. There are right and wrong ways to conduct randomization, but in general a random number table or computer-generated randomization are the best ways. Once subjects are randomized, there should remain in that group throughout the study and during the analysis. This preserves the best chances that the groups are equal in all ways other than the intervention of interest.
Was allocation adequately concealed?
This one is frequently difficult to understand. Allocation concealment is not intervention "blinding". Instead, point is to ensure that the person recruiting for the study (the person who invites people into the study, applies the inclusion and exclusion criteria and consents them for the study) does not know the group into which the subject will be placed. In other words, the recruiter is tasked with getting the appropriate pool of patients into the study. Randomization should happen after that point. If this is not done properly, a recruiter could choose to not enroll the subject (even though they meet inclusion criteria) because they did not want the subject in a particular group. Granted, this seems far-fetched, but unconscious bias may also play a role here. There is empirical evidence (as with the other items in this list) that failure to conceal allocation can introduce bias (and incorrectly overestimate the effect of the intervention).
Was knowledge of the allocated intervention adequately prevented during the study?
Blinding does this. It's important that the groups, that were carefully randomized to make them the same - are treated the same throughout the study. The most important part of this is making sure they don't know which treatment arm they're in. We use the placebo effect for this purpose, telling all the subjects that they MAY be getting the intervention. This goes for both active treatments vs. no treatment as well as for two treatments compared to each other. We use "dummy" interventions to help with blinding - a placebo pill is an example of a dummy. It must look, taste and smell like the active pill. If we are comparing an injection and an oral medication, we have to use a "double-dummy" approach - one group gets placebo injection and active pill, and the other group gets active injection and placebo pill. Researchers may want to "test" the effectiveness of their blinding by asking subjects whether they thought they were on active treatment or control.
Were incomplete outcome data adequately addressed?
Incomplete outcome data usually arises from subjects dropping out of the study. This can cause two problems. First, the withdrawals and dropouts may be unequal between groups, which could undo the equalization effect of the randomization. Remember, it's not OK to just switch any dropouts from the active group to the control group - this will result in a non-randomized study (you're are allocating people based on their adherence to the treatment regimen, not randomly). Second, the dropouts may reduce the overall power of the study to find an effect. Power depends on the number of subjects in the study, and this is carefully calculated for the study protocol prior to recruitment, so if subjects are lost, it may not be possible to find a difference between groups even if one exists. Generally, if fewer than 5% of the subjects are lost, there is no problem. If greater than 20% of subjects are lost, this is frequently fatal to the analysis. Between 5 and 20% loss, the researchers should explain the possible effects of this loss on the analysis.
Are reports of the study free of suggestion of selective outcome reporting?
Increasingly, clinical trials are required to register their protocols (the initial designs of their study) on a government web site, so that journal editors and others can examine the original purpose and outcomes the researchers intended to study. This has been done because of some cases of researchers not publishing data that didn't "work out" for them or for their funders. Remember that studies are designed to have a primary outcome - which is supposed to be the most important to the researchers, but is also the one on which many of the statistical calculations are based. If this is not reported for whatever reason, it makes the entire trial reporting suspect.
Was the study free of other problems that could put it at a high risk of bias?
There are a lot of concepts that fall in this category - ghost writing of the paper, pharmaceutical funding, and unequal treatment of the two groups. The most important are:
Intention to treat analysis - subjects must be analyzed in the groups to which they are randomized. It is inappropriate to switch a subject from the intervention group to the control group simply because they did not comply with their intervention (medication, surgery, whatever). The principal problem with this is that it breaks the randomization process - you've now created groups that do not simply differ randomly. Second, ITT is a "real-world" test of an intervention - if patients won't take the medication because of safety issues, bad side effects, or cost - then it won't matter how well the medication can work. It is OK to do a per-protocol analysis (subjects analyzed in the group they complied with) if the researcher simultaneously present an intention-to-treat analysis to compare.
Power - You should find a power calculation near the end of the methods section, then assure that the researchers got that number of subjects into their study, and kept them in there (or at least had data about them if they withdrew). You may want to look at the power analysis yourself: the alpha level is usually set at 0.05 or 0.01, the power (1-beta) level is usually set at 0.8 or 0.9. The key for clinicians to examine is the "difference" or the "effect size" that the researchers wanted to be able to detect at those levels of alpha and power. Was it a 10% difference in mortality? Is that a clinically significant difference if they can detect it? (sure sounds like it would be) - so a clinician's mind when reading these is important.
Adjustment for baseline differences in allocation - there may be differences in the groups in important subject characteristics that are merely a result of randomization. Authors should acknowledge these and ensure that the results take these into account - possibly by regression analysis or other multivariable analysis.
GRADE Method
GRADE is an internationally-adopted framework for assessing a body of evidence (multiple studies) that inform a guideline recommendation. See more about grading evidence.
Steps of critical appraisal of a therapy study
What is the study design?
Randomized controlled trial --> starts as High Quality evidence
Cohort or case-control studies--> starts as Low Quality evidence
Certain characteristics of the study may change the quality assessment
Characteristics that INCREASE quality:
Large Magnitude of Results
relative risk, odds ratio, relative risk reduction, absolute risk reduction or number needed to treat.
Dose-response gradient
the more of an intervention, the more of an effect
An effect was seen despite the plausible biases that would otherwise REDUCE the treatment effect
if it's a study about a drug that reduces heart disease, and there were more smokers in the group that got the drug, we would expect that group to do worse...but if they do better (assuming all else is equal), then the drug must be pretty good indeed!
Characteristics that DECREASE quality
Study limitations
lack of allocation concealment
lack of blinding – especially if the outcomes are subjective (a complex diagnosis or a subjective amount of improvement)
large losses to follow-up (less than 5% loss is OK, between 5 and 20% is questionable, greater than 20% loss is a fatal flaw)
failure to adhere to an intention to treat analysis
stopping early for benefit (you might miss harmful outcomes later)
failure to report outcomes (typically those for which no effect was observed)
Inconsistency of results
compared to other studies you know (or other studies in the set you're evaluating)
Indirectness of evidence
differences in population, intervention, comparator or outcome between your original question and this study
Imprecision
wide confidence intervals (which are usually caused by small study populations)
Reporting bias
publication bias
is it possible that studies that contradict this one have not been published?
is it possible that the authors are not reporting all the outcomes evaluated in the study?
Final Quality Assessment
Decide how many of the additional characteristics above apply and how far the study is "moved" from the baseline assessment. Remember, the more biases present in a study, the more likely further (well-done) research can change the overall results.
High Quality- Further research is very unlikely to change our confidence in the estimate of effect.
Moderate Quality - Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low Quality - Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very Low Quality - Any estimate of effect is very uncertain.