Quantifying the added value of new biomarkers: how and how not

Diagnostic and Prognostic Research - Tập 2 - Trang 1-7 - 2018
Nancy R. Cook1
1Division of Preventive Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, USA

Tóm tắt

Over the past few decades, interest in biomarkers to enhance predictive modeling has soared. Methodology for evaluating these has also been an active area of research. There are now several performance measures available for quantifying the added value of biomarkers. This commentary provides an overview of methods currently used to evaluate new biomarkers, describes their strengths and limitations, and offers some suggestions on their use.

Tài liệu tham khảo

Kannel WB, Dawber TR, Kagan A, Revotskie N, Stokes J 3rd. Factors of risk in the development of coronary heart disease--six year follow-up experience. The Framingham Study. Annals int med. 1961;55:33–50. Wilson PW, D’Agostino RB Sr, Levy D, Belanger AM, Silbershatz H, Kannel WB. Prediction of coronary heart disease using risk factor categories. Circulation. 1998;97(18):1837–47. Harrell FE Jr. Regression modeling strategies. 2nd ed. New York: Springer; 2015. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. 2nd ed. New York: Springer; 2009. Vickers AJ, Cronin AM, Begg CB. One statistical test is sufficient for assessing new predictive markers. BMC Med Res Methodol. 2011;11:13. Pepe MS, Kerr KF, Longton G, Wang Z. Testing for improvement in prediction model performance. Stat Med. 2013;32(9):1467–82. Demler OV, Pencina MJ, Cook NR, D’Agostino RB Sr. Asymptotic distribution of AUC, NRIs, and IDI based on theory of U-statistics. Stat Med. 2017;36(21):3334–60. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36. Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148(3):839–43. Harrell FE Jr, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA. 1982;247(18):1543–6. Pencina MJ, D’Agostino RB. Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation. Stat Med. 2004;23(13):2109–23. Gönen M, Heller G. Concordance probability and discriminatory power in proportional hazards regression. Biometrika. 2005;92(4):965–70. Pencina MJ, D’Agostino RB Sr, Song L. Quantifying discrimination of Framingham risk functions with different survival C statistics. Stat Med. 2012;31(15):1543–53. Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in medicine. 2011;30(10):1105–17. Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004;159:882–90. Wang TJ, Gona P, Larson MG, et al. Multiple biomarkers for the prediction of first major cardiovascular events and death. N Engl J Med. 2006;355:2631–9. Ware JH. The limitations of risk factors as prognostic tools. N Engl J Med. 2006;355(25):2615–7. Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115:928–35. Cook NR, Buring JE, Ridker PM. The effect of including C-reactive protein in cardiovascular risk prediction models for women. Ann Intern Med. 2006;145(1):21–9. Cook NR, Paynter NP. Performance of reclassification statistics in comparing risk prediction models. Biom J. 2011;53(2):237–58. Pencina MJ, D’Agostino RB Sr, D’Agostino RB Jr, Vasan RS. Evaluating the added predictive ability of a new biomarker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27:157–72. Leening MJ, Vedder MM, Witteman JC, Pencina MJ, Steyerberg EW. Net reclassification improvement: computation, interpretation, and controversies: a literature review and clinician’s guide. Ann Intern Med. 2014;160(2):122-131. Cook NR, Paynter NP. Comments on ‘Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers’ by M. J. Pencina, R. B. D’Agostino, Sr. and E. W. Steyerberg. Stat Med 2012;31(1):93–95; author reply 96–97. Pencina KM, Pencina MJ, D’Agostino RB Sr. What to expect from net reclassification improvement with three categories. Stat Med. 2014;33(28):4975–87. Pencina MJ, Steyerberg EW, D'Agostino RB Sr. Net reclassification index at event rate: properties and relationships. Stat Med. 2017;36:4455–67. Cook NR, Demler OV, Paynter NP. Clinical risk reclassification at 10 years. Stat Med. 2017;36:4498–502. van Smeden M, Moons KGM. Event rate net reclassification index and the integrated discrimination improvement for studying incremental value of risk markers. Stat Med. 2017;36(28):4495–7. Ridker PM, Buring JE, Rifai N, Cook NR. Development and validation of improved algorithms for the assessment of global cardiovascular risk in women. JAMA. 2007;297:611–9. Goff DC Jr, Lloyd-Jones DM, Bennett G, et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation. 2014;129(suppl 2):S49–73. Paynter NP, Cook NR. A bias-corrected net reclassification improvement for clinical subgroups. Medical decision making : an international journal of the Society for Medical Decision Making. 2013;33(2):154–62. Paynter NP, Cook NR. Adding tests to risk based guidelines: evaluating improvements in prediction for an intermediate risk group. BMJ. Sep 07 2016;354:i4450. Pencina MJ, D'Agostino RB Sr, Steyerberg EW. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med. 2011;30(1):11–21. Cook NR, Paynter NP. Comments on ‘Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers’ by M. J. Pencina, R. B. D’Agostino Sr and E. W. Steyerberg, Stat Med 2010; 30(1):11–21. Statistics in medicine. 2012;31:93–95. Pencina MJ, D’Agostino RB, Pencina KM, Janssens AC, Greenland P. Interpreting incremental value of markers added to risk prediction models. Am J Epidemiol. 2012;176(6):473–81. Pepe MS, Feng Z, Gu JW. Comments on ‘Evaluating the added predictive ability of a new biomarker: from area under the ROC curve to reclassification and beyond’. Stat Med 2008;27:173–181. Tjur T. Coefficients of determination in logistic regression models—a new proposal: the coefficient of discrimination. Am Statist. 2009;63:366–72. Pencina MJ, D’Agostino RB Sr, D’Agostino RB Jr, Vasan RS. Comments on ‘Integrated discrimination and net reclassification improvements—practical advice’. Stat Med. 2008;27:207–12. Kerr KF, McClelland RL, Brown ER, Lumley T. Evaluating the incremental value of new biomarkers with integrated discrimination improvement. Am J Epidemiol. 2011;174(3):364–74. Steyerberg EW, Pencina MJ. Reclassification calculations for persons with incomplete follow-up. Ann Intern Med. 2010;152(3):195–6. author reply 196–197 Demler OV, Paynter NP, Cook NR. Tests of calibration and goodness-of-fit in the survival setting. Statistics in medicine. 2015;34(10):1659–80. Hilden J, Gerds TA. A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index. Stat Med. 2014;33(19):3405–14. Hilden J. Commentary: on NRI, IDI, and “good-looking” statistics with nothing underneath. Epidemiology. 2014;25(2):265–7. Pencina MJ, Fine JP, D’Agostino RB Sr. Discrimination slope and integrated discrimination improvement - properties, relationships and impact of calibration. Stat Med. 2017;36(28):4482–90. Leening MJ, Steyerberg EW, Van Calster B, D’Agostino RB Sr, Pencina MJ. Net reclassification improvement and integrated discrimination improvement require calibrated models: relevance from a marker and model perspective. Stat Med. Aug 30 2014;33(19):3415–8. Stone NJ, Robinson JG, Lichtenstein AH, et al. 2013 ACC/AHA guideline on the treatment of blood cholesterol to reduce atherosclerotic cardiovascular risk in adults: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation. 2014;129(25 Suppl 2):S1–45. Baker SG, Cook NR, Vickers A, Kramer BS. Using relative utility curves to evaluate risk prediction. J Royal Statistical Soc Series A. 2009;172(4):729–48. Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006;26:565–74. Vickers AJ, Van Calster B, Steyerberg EW. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ. 2016;352:i6. Van Calster B, Vickers AJ. Calibration of risk prediction models: impact on decision-analytic performance. Med Decis Mak. 2015;35(2):162–9. Kerr KF, Brown MD, Zhu K, Janes H. Assessing the clinical impact of risk prediction models with decision curves: guidance for correct interpretation and appropriate use. J Clin Oncol Offic J Am Soc Clin Oncol. 2016;34(21):2534–40. Baker SG. The summary test tradeoff: a new measure of the value of an additional risk prediction marker. Stat Med. 2017;36(28):4491–4. Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Ann Intern Med. 1999;130(6):515–24.