Critical Appraisal of Diagnostic Test Studies - Validity

Introduction

The VALIDITY of diagnostic test studies (cohorts and case control studies) can be assessed primarily by answering the following (long) question:

Was there an independent, blind comparison to a recognized reference standard for the condition in an appropriate spectrum of patients?

Criteria

  • In a diagnostic test study, the subjects should get both the test of interest and the reference standard for the condition so that the results of these two tests can be compared.

    • recognized reference standard - this test should be a non-controversial, definitive test used to diagnose the condition - e.g., a pathologic diagnosis, a confirmatory x-ray, etc.

      • Sometimes - with depression, or other syndromic diagnoses - this requires some judgment to assess. A structured interview by a psychiatrist using DSM-IV criteria for depression might be a reasonable reference standard for depression.

    • independent - ideally, the subjects should get both tests without regard to the results of each. If the researchers only perform the reference standard when the test of interest is positive, then that's not an independent comparison (e.g. the researchers may want to reserve an invasive reference standard (a procedure or surgery) for only those patients with a positive test of interest, but this still compromises the validity of the study)

    • blind - the researchers performing the test of interest should not know the results of the reference standard and vice-versa - to do so might bias the interpretation of either of the tests

    • appropriate spectrum of patients - patients with a range of severity and progression of disease should be used in the study to assess whether the test works well across the spectrum of disease.

Other Considerations

Sometimes, researchers will do studies looking for the value of several different tests or combinations of tests in diagnosing a condition (e.g., different urine dipstick values (nitrite, leukocyte esterase, blood, etc.) in determining urinary tract infection). In this case, the researchers should do this in two stages: one group of patients (the exploratory cohort) should be used to look for the tests that show the best characteristics (sensitivity, specificity, etc.) from the larger group of tests, then different group of patients (the validation cohort) should be studied using only those tests in the manner described above. This helps prevent an erroneous finding of significance because of multiple comparisons (if you look at many variables, there is a greater chance of a something looking significant by chance).

i.e. for the UTI example - the researchers did urine dipsticks on everyone suspected of UTI, and compared them with urine culture results. they may find in the exploratory cohort that the combination of nitrites, blood and WBC count > 5/hpf best predict UTI. They should then set up a validation cohort study, where this combination is specifically tested (independently, blind, etc.) to diagnose UTI.