A systematic review classifies sources of bias and variation in diagnostic test accuracy studies

Journal of Clinical Epidemiology - Tập 66 - Trang 1093-1104 - 2013
Penny F. Whiting1,2, Anne W.S. Rutjes3,4, Marie E. Westwood1, Susan Mallett5
1Kleijnen Systematic Reviews Ltd, Unit 6, Escrick Business Park, Riccall Road, Escrick, York YO19 6FD, United Kingdom
2School of Social and Community Medicine, University of Bristol, Canynge Hall, 39 Whatley Road, Bristol BS8 2PS, United Kingdom
3Division of Clinical Epidemiology & Biostatistics, Institute of Social and Preventive Medicine, University of Bern, Finkenhubelweg 11, 3012 Bern, Switzerland
4Centre for Aging Sciences (Ce.S.I.), G. d’Annunzio University Foundation, Chieti, Italy
5Department of Primary Care Health Sciences, University of Oxford, Radcliffe Observatory Quarter, Woodstock Road, Oxford OX2 6GG, United Kingdom

Tài liệu tham khảo

Whiting, 2011, QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies, Ann Intern Med, 155, 529, 10.7326/0003-4819-155-8-201110180-00009 Lord, 2011, Target practice: choosing target conditions for test accuracy studies that are relevant to clinical practice, BMJ, 343, d4684, 10.1136/bmj.d4684 Whiting, 2004, Sources of variation and bias in studies of diagnostic accuracy: a systematic review, Ann Intern Med, 140, 189, 10.7326/0003-4819-140-3-200402030-00010 Begg, 1987, Biases in the assessment of diagnostic tests, Stat Med, 6, 411, 10.1002/sim.4780060402 Lijmer, 1999, Empirical evidence of design-related bias in studies of diagnostic tests, JAMA, 282, 1061, 10.1001/jama.282.11.1061 Boyer, 2009, Effects of bias on the results of diagnostic studies of carpal tunnel syndrome, J Hand Surg Am, 34, 1006, 10.1016/j.jhsa.2009.02.018 Medeiros, 2007, The effects of study design and spectrum bias on the evaluation of diagnostic accuracy of confocal scanning laser ophthalmoscopy in glaucoma, Invest Ophthalmol Vis Sci, 48, 214, 10.1167/iovs.06-0618 Burch, 2006, 86 Rutjes, 2006, Evidence of bias and variation in diagnostic accuracy studies, CMAJ, 174, 469, 10.1503/cmaj.050090 Rutjes, 2005, 121 Biesheuvel, 2008, Advantages of the nested case-control design in diagnostic research, BMC Med Res Methodol, 8, 10.1186/1471-2288-8-48 Whiting, 2003, The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews, BMC Med Res Methodol, 3, 25, 10.1186/1471-2288-3-25 Morris, 2011, Methodological quality of test accuracy studies included in systematic reviews in obstetrics and gynaecology: sources of bias, BMC Womens Health, 11, 2011, 10.1186/1472-6874-11-7 Michaud, 2002, Effect of design-related bias in studies of diagnostic tests for ventilator-associated pneumonia, Am J Respir Crit Care Med, 166, 1320, 10.1164/rccm.200202-130CP Clark, 2004, Bias associated with delayed verification in test accuracy studies: accuracy of tests for endometrial hyperplasia may be much higher than we think!, BMC Med, 2, 18, 10.1186/1741-7015-2-18 Goodacre, 2005, Variation in the diagnostic performance of D-dimer for suspected deep vein thrombosis, QJM, 98, 513, 10.1093/qjmed/hci085 Houssami, 2010, Design-related bias in estimates of accuracy when comparing imaging tests: examples from breast imaging research, Eur Radiol, 20, 2061, 10.1007/s00330-010-1779-6 Philbrick, 1982, The limited spectrum of patients studied in exercise test research. Analyzing the tip of the iceberg, JAMA, 248, 2467, 10.1001/jama.1982.03330190031026 Detrano, 1988, Factors affecting sensitivity and specificity of a diagnostic test: the exercise thallium scintigram, Am J Med, 84, 699, 10.1016/0002-9343(88)90107-6 Geleijnse, 2009, Factors affecting sensitivity and specificity of diagnostic testing: dobutamine stress echocardiography, J Am Soc Echocardiogr, 22, 1199, 10.1016/j.echo.2009.07.006 Haines, 2007, Design-related bias in hospital fall risk screening tool predictive accuracy evaluations: systematic review and meta-analysis, J Gerontol A Biol Sci Med Sci, 62, 664, 10.1093/gerona/62.6.664 Stengel, 2005, Association between compliance with methodological standards of diagnostic research and reported test accuracy: meta-analysis of focused assessment of US for trauma, Radiology, 236, 102, 10.1148/radiol.2361040791 Detrano, 1989, Exercise-induced ST segment depression in the diagnosis of multivessel coronary disease: a meta analysis, J Am Coll Cardiol, 14, 1501, 10.1016/0735-1097(89)90388-4 Curtin, 1997, Body mass index compared to dual-energy x-ray absorptiometry: evidence for a spectrum bias, J Clin Epidemiol, 50, 837, 10.1016/S0895-4356(97)00063-2 Hlatky, 1984, Factors affecting sensitivity and specificity of exercise electrocardiography. Multivariable analysis, Am J Med, 77, 64, 10.1016/0002-9343(84)90437-6 Levy, 1990, Determinant of sensitivity and specificity of electrocardiographic criteria for left ventricular hypertrophy, Circulation, 81, 815, 10.1161/01.CIR.81.3.815 Moons, 1997, Limitations of sensitivity, specificity, likelihood ratio, and Bayes' theorem in assessing diagnostic probabilities: a clinical example, Epidemiology, 8, 12, 10.1097/00001648-199701000-00002 Morise, 1995, Comparison of the sensitivity and specificity of exercise electrocardiography in biased and unbiased populations of men and women, Am Heart J, 130, 741, 10.1016/0002-8703(95)90072-1 Syed, 2008, Effect of referral bias on the diagnostic accuracy of N-13 ammonia and rubidium-82 myocardial perfusion imaging with positron emission tomography in the detection of coronary artery disease, Circulation, 118 Thompson, 2006, Effect of finasteride on the sensitivity of PSA for detecting prostate cancer, J Natl Cancer Inst, 98, 1128, 10.1093/jnci/djj307 Van Turenhout, 2011, Gender disparities in performance of a fecal immunochemical test for detection of advanced neoplasia, Gastroenterology, S405, 10.1016/S0016-5085(11)61665-X Siccama, 2011, Systematic review: diagnostic accuracy of clinical decision rules for venous thromboembolism in elderly, Ageing Res Rev, 10, 304, 10.1016/j.arr.2010.10.005 Elie, 2008, A methodological framework to distinguish spectrum effects from spectrum biases and to assess diagnostic and screening test accuracy for patient populations: application to the Papanicolaou cervical cancer smear test, BMC Med Res Methodol, 8, 7, 10.1186/1471-2288-8-7 Punglia, 2003, Effect of verification bias on screening for prostate cancer by measurement of prostate-specific antigen, N Engl J Med, 349, 335, 10.1056/NEJMoa021659 Steinbauer, 1998, Ethnic and sex bias in primary care screening tests for alcohol use disorders, Ann Intern Med, 129, 353, 10.7326/0003-4819-129-5-199809010-00002 Bachmann, 2009, Multivariable adjustments counteract spectrum and test review bias in accuracy studies, J Clin Epidemiol, 62, 357, 10.1016/j.jclinepi.2008.02.007 Gilbert, 2003, Meta-analysis of EEG test performance shows wide variation among studies (Provisional abstract), Neurology, 60, 564, 10.1212/01.WNL.0000044152.79316.27 Mastandrea, 2008, Some heterogeneity factors affecting the B-type natriuretic peptides outcome: a meta-analysis, Clin Chem Lab Med, 46, 1687, 10.1515/CCLM.2008.348 Gaffikin, 2008, Avoiding verification bias in screening test evaluation in resource poor settings: a case study from Zimbabwe, Clin Trials, 5, 496, 10.1177/1740774508096139 Miller, 2002, Effects of adjustment for referral bias on the sensitivity and specificity of single photon emission computed tomography for the diagnosis of coronary artery disease, Am J Med, 112, 290, 10.1016/S0002-9343(01)01111-1 Roger, 1997, Sex and test verification bias. Impact on the diagnostic value of exercise echocardiography, Circulation, 95, 405, 10.1161/01.CIR.95.2.405 Santana-Boado, 1998, Diagnostic accuracy of technetium-99m-MIBI myocardial SPECT in women and men, J Nucl Med, 39, 751 Shoaibi, 2009, Gender differences in correlates of troponin assay in diagnosis of myocardial infarction, Transl Res, 154, 250, 10.1016/j.trsl.2009.07.004 Yoon, 2009, The effect of beta-blockers on the diagnostic accuracy of vasodilator pharmacologic SPECT myocardial perfusion imaging, J Nucl Cardiol, 16, 358, 10.1007/s12350-009-9066-0 Barber, 2006, Can we screen for pelvic organ prolapse without a physical examination in epidemiologic studies?, Am J Obstet Gynecol, 195, 942, 10.1016/j.ajog.2006.02.050 Lachs, 1992, Spectrum bias in the evaluation of diagnostic tests: lessons from the rapid dipstick test for urinary tract infection, Ann Intern Med, 117, 135, 10.7326/0003-4819-117-2-135 Melbye, 1993, The spectrum of patients strongly influences the usefulness of diagnostic tests for pneumonia, Scand J Prim Health Care, 11, 241, 10.3109/02813439308994838 O'Connor, 1996, The effect of spectrum bias on the utility of magnetic resonance imaging and evoked potentials in the diagnosis of suspected multiple sclerosis, Neurology, 47, 140, 10.1212/WNL.47.1.140 van der Schouw, 1995, Problems in selecting the adequate patient population from existing data files for assessment studies of new diagnostic tests, J Clin Epidemiol, 48, 417, 10.1016/0895-4356(94)00144-F Egglin, 1996, Context bias: a problem in diagnostic radiology, JAMA, 276, 1752, 10.1001/jama.1996.03540210060035 Zhang, 2002, Sensitivity of ultrasound screening for congenital anomalies in unselected pregnancies, Rev Epidemiol Sante Publique, 50, 571 Rozanski, 1983, The declining specificity of exercise radionuclide ventriculography, N Engl J Med, 309, 518, 10.1056/NEJM198309013090902 Tobin, 2006, Variable performance of weaning-predictor tests: role of Bayes' theorem and spectrum and test-referral bias, Intensive Care Med, 32, 2002, 10.1007/s00134-006-0439-4 Kittler, 2002, Diagnostic accuracy of dermoscopy, Lancet Oncol, 3, 159, 10.1016/S1470-2045(02)00679-4 Alberg, 2004, The use of "overall accuracy" to evaluate the validity of screening or diagnostic tests, J Gen Intern Med, 19, 460, 10.1111/j.1525-1497.2004.30091.x Leeflang, 2009, Diagnostic test accuracy may vary with prevalence: implications for evidence-based diagnosis, J Clin Epidemiol, 62, 5, 10.1016/j.jclinepi.2008.04.007 Ransohoff, 1978, Problems of spectrum and bias in evaluating the efficacy of diagnostic tests, N Engl J Med, 299, 926, 10.1056/NEJM197810262991705 Dimatteo, 2001, The relationship between the clinical features of pharyngitis and the sensitivity of a rapid antigen test: evidence of spectrum bias, Ann Emerg Med, 38, 648, 10.1067/mem.2001.119850 Hall, 2004, Spectrum bias of a rapid antigen detection test for group A beta-hemolytic streptococcal pharyngitis in a pediatric population, Pediatrics, 114, 182, 10.1542/peds.114.1.182 Pretorius, 2007, Inappropriate gold standard bias in cervical cancer screening studies, Int J Cancer, 121, 2218, 10.1002/ijc.22991 Stein, 1993, Chest, 104, 1461, 10.1378/chest.104.5.1461 Taube, 1990, Over- and underestimation of the sensitivity of a diagnostic malignancy test due to various selections of the study population, Acta Oncol, 29, 1, 10.3109/02841869009091785 Brealey, 2007, Evidence of reference standard related bias in studies of plain radiograph reading performance: a meta-regression, Br J Radiol, 80, 406, 10.1259/bjr/41006673 Detrano, 1988, Methodologic problems in exercise testing research. Are we solving them?, Arch Intern Med, 148, 1289, 10.1001/archinte.1988.00380060053013 Ewald, 2006, Post hoc choice of cut points introduced bias to diagnostic research, J Clin Epidemiol, 59, 798, 10.1016/j.jclinepi.2005.11.025 Leeflang, 2008, Bias in sensitivity and specificity caused by data-driven selection of optimal cutoff values: mechanisms, magnitude, and solutions, Clin Chem, 54, 729, 10.1373/clinchem.2007.096032 Erly, 2003, Evaluation of emergency CT scans of the head: is there a community standard?, AJR Am J Roentgenol, 180, 1727, 10.2214/ajr.180.6.1801727 Moore, 2005, Clinical diagnostic accuracy and magnetic resonance imaging of patients referred by physical therapists, orthopaedic surgeons, and nonorthopaedic providers, J Orthop Sports Phys Ther, 35, 67, 10.2519/jospt.2005.35.2.67 Ciccone, 1992, Inter-observer and intra-observer variability of mammogram interpretation: a field study, Eur J Cancer, 28A, 1054, 10.1016/0959-8049(92)90455-B Berbaum, 1989, Impact of clinical history on radiographic detection of fractures: a comparison of radiologists and orthopedists, AJR Am J Roentgenol, 153, 1221, 10.2214/ajr.153.6.1221 Cohen, 1987, Influence of training and experience in fine-needle aspiration biopsy of breast—receiver operating characteristic curve analysis, Arch Pathol Lab Med, 111, 518 Cuaron, 1980, Interobserver variability in the interpretation of myocardial images with Tc-99m-labeled diphosphonate and pyrophosphate, J Nucl Med, 21, 1 Elmore, 1994, Variability in radiologists' interpretation of mammograms, N Engl J Med, 331, 1493, 10.1056/NEJM199412013312206 Raab, 1995, Pathology and probability: likelihood ratios and receiver operating characteristic curves in the interpretation of bronchial brush specimens, Am J Clin Pathol, 103, 588, 10.1093/ajcp/103.5.588 Ronco, 1996, Estimating the sensitivity of cervical cytology: errors of interpretation and test limitations, Cytopathology, 7, 151, 10.1046/j.1365-2303.1996.39382393.x Corley, 1997, Reproducibility of the histologic diagnosis of pneumonia among a panel of four pathologists: analysis of a gold standard, Chest, 112, 458, 10.1378/chest.112.2.458 Wardlaw, 2005, Early signs of brain infarction at CT: observer reliability and outcome after thrombolytic treatment. Systematic review (Structured abstract), Radiology, 235, 444, 10.1148/radiol.2352040262 Froelicher, 1998, Ann Intern Med, 128, 965, 10.7326/0003-4819-128-12_Part_1-199806150-00001 Suri, 2010, Bias in the physical examination of patients with lumbar radiculopathy, BMC Musculoskelet Disord, 11, 275, 10.1186/1471-2474-11-275 Berbaum, 1988, Impact of clinical history on fracture detection with radiography, Radiology, 168, 507, 10.1148/radiology.168.2.3393672 Doubilet, 1981, Interpretation of radiographs: effect of clinical history, AJR Am J Roentgenol, 137, 1055, 10.2214/ajr.137.5.1055 Eldevik, 1982, The effect of clinical bias on the interpretation of myelography and spinal computer tomography, Radiology, 145, 85, 10.1148/radiology.145.1.7122902 Potchen, 1979, The effect of clinical history data on chest film interpretation: direction or distraction, Invest Radiol, 14, 404 Schreiber, 1963, The clinical history as a factor in roentgenogram interpretation, JAMA, 185, 137, 10.1001/jama.1963.03060050077027 Raab, 2000, Effect of clinical history on diagnostic accuracy in the cytologic interpretation of bronchial brush specimens, Am J Clin Pathol, 114, 78, 10.1309/4099-QALD-NVGF-TM4G Elmore, 1997, The impact of clinical history on mammographic interpretations, JAMA, 277, 49, 10.1001/jama.1997.03540250057032 Good, 1990, Does knowledge of the clinical history affect the accuracy of chest radiograph interpretation?, AJR Am J Roentgenol, 154, 709, 10.2214/ajr.154.4.2107662 Irwig, 2006, New methods give better estimates of changes in diagnostic accuracy when prior information is provided, J Clin Epidemiol, 59, 299, 10.1016/j.jclinepi.2005.08.013 Sonnad, 2001, Accuracy of MR imaging for staging prostate cancer: a meta-analysis to examine the effect of technologic change, Acad Radiol, 8, 149, 10.1016/S1076-6332(01)90095-9 Jedrzkiewicz, 2011, Three-dimensional transesophageal echocardiography accurately predicts mitral valve anatomy in patients undergoing repair for advanced mitral valve prolapse, J Am Soc Echocardiogr, 24, B44 Mannath, 2011, An inter-observer agreement study of autofluorescence endoscopy in Barrett's oesophagus among expert endoscopists, Gastrointest Endosc, 26, AB277, 10.1016/j.gie.2011.03.531 Davey, 2006, Effect of study design and quality on unsatisfactory rates, cytology classifications, and accuracy in liquid-based versus conventional cervical cytology: a systematic review, Lancet, 367, 122, 10.1016/S0140-6736(06)67961-0 van Rijkom, 1995, Factors involved in validity measurements of diagnostic tests for approximal caries—a meta-analysis, Caries Res, 29, 364, 10.1159/000262094 Philbrick, 2003, The d-dimer test for deep venous thrombosis: gold standards and bias in negative predictive value, Clin Chem, 49, 570, 10.1373/49.4.570 Arana, 1990, The effect of diagnostic methodology on the sensitivity of the TRH stimulation test for depression: a literature review, Biol Psychiatry, 28, 733, 10.1016/0006-3223(90)90460-J Cagle, 2010, Use of an expanded gold standard to estimate the accuracy of colposcopy and visual inspection with acetic acid, Int J Cancer, 126, 156, 10.1002/ijc.24719 Choudhury, 2010, Assessing operating characteristics of CAD algorithms in the absence of a gold standard, Med Phys, 37, 1788, 10.1118/1.3352687 De Neef, 1987, Evaluating rapid test for streptococcal pharyngitis: the apparent accuracy of a diagnostic test when there are errors in the standard of comparison, Med Decis Making, 7, 92, 10.1177/0272989X8700700205 Boyko, 1988, Reference test errors bias the evaluation of diagnostic tests for ischemic heart disease, J Gen Intern Med, 3, 476, 10.1007/BF02595925 Thibodeau, 1981, Evaluating diagnostic tests, Biometrics, 801, 10.2307/2530161 Phelps, 1995, Estimating diagnostic test accuracy using a “fuzzy gold standard”, Med Decis Making, 15, 44, 10.1177/0272989X9501500108 van der Aa, 2010, Cystoscopy revisited as the gold standard for detecting bladder cancer recurrence: diagnostic review bias in the randomized, prospective CEFUB trial, J Urol, 183, 76, 10.1016/j.juro.2009.08.150 Gupta, 2004, Verification and incorporation biases in studies assessing screening tests: prostate-specific antigen as an example, Urology, 64, 106, 10.1016/j.urology.2004.02.025 Rosman, 2010, Effect of verification bias on the sensitivity of fecal occult blood testing: a meta-analysis, J Gen Intern Med, 25, 1211, 10.1007/s11606-010-1375-0 Mol, 1999, Effect of study design on the association between nuchal translucency measurement and Down syndrome, Obstet Gynecol, 94, 864, 10.1016/S0029-7844(99)00496-2 Ransohoff, 1982, Diagnostic work-up bias in the evaluation of a test: serum ferritin and hereditary hemochromatosis, Med Decis Making, 2, 139, 10.1177/0272989X8200200205 Cecil, 1996, The importance of work-up (verification) bias correction in assessing the accuracy of SPECT thallium-201 testing for the diagnosis of coronary artery disease, J Clin Epidemiol, 49, 735, 10.1016/0895-4356(96)00014-5 Panzer, 1987, Workup bias in prediction research, Med Decis Making, 7, 115, 10.1177/0272989X8700700209 Patel, 2009, Impact of verification bias on pulmonary hypertension using echocardiography and right heart catheterization, Chest, 136 Lauer, 2007, [18F]Fluorodeoxyglucose uptake by positron emission tomography for diagnosis of suspected lung cancer: impact of verification bias, Arch Intern Med, 167, 161, 10.1001/archinte.167.2.161 Diamond, 1991, Affirmative actions: can the discriminant accuracy of a test be determined in the face of selection bias?, Med Decis Making, 11, 48, 10.1177/0272989X9101100109 Diamond, 1992, Off Bayes: effect of verification bias on posterior probabilities calculated using Bayes' theorem, Med Decis Making, 12, 22, 10.1177/0272989X9201200105 Nishikawa, 2010, Verification bias in assessment of the utility of MRI in the diagnosis of cruciate ligament tears, AJR Am J Roentgenol, 195, W357, 10.2214/AJR.10.4189 Lijmer, 1996, ROC analysis of noninvasive tests for peripheral arterial disease, Ultrasound Med Biol, 22, 391, 10.1016/0301-5629(96)00036-1 Zhou, 1994, Effect of verification bias on positive and negative predictive values, Stat Med, 13, 1737, 10.1002/sim.4780131705 Alonzo, 2011, Bias in estimating accuracy of a binary screening test with differential disease verification, Stat Med, 30, 1852, 10.1002/sim.4232 Bowler, 1998, Fallacies in the pathological confirmation of the diagnosis of Alzheimer's disease, J Neurol Neurosurg Psychiatry, 64, 18, 10.1136/jnnp.64.1.18 Morise AP, Diamond GA. Does sex discrimination explain the differences in test accuracy among men and women referred for exercise electrocardiography? 67th Scientific Sessions of the American Heart Association, Dallas, Texas, USA, November 1994; 90(4 PART 2).