A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms

BMC Medical Informatics and Decision Making - 2020

André M. Carrington¹, Paul Fieguth², Hammad Ali Qazi³, Andreas Holzinger⁴, Helen Chen³, Franz Mayr⁵, Manuel Dujovny⁶

¹Ottawa Hospital Research Institute, Ottawa, K1H 8L6, Canada

²Faculty of Engineering, University of Waterloo, Waterloo, N2L 3G1, Canada

³School of Public Health and Health Systems, University of Waterloo, Waterloo, N2L 3G1, Canada

⁴Holzinger Group (HCAI), Institute for Medical Informatics/Statistics, Medical University Graz, 8036, Graz, Austria

⁵Universidad ORT Uruguay, 11100 Montevideo, Uruguay

⁶Department of Family Medicine, University of Ottawa, Ottawa, Canada

Tóm tắt

Abstract Background In classification and diagnostic testing, the receiver-operator characteristic (ROC) plot and the area under the ROC curve (AUC) describe how an adjustable threshold causes changes in two types of error: false positives and false negatives. Only part of the ROC curve and AUC are informative however when they are used with imbalanced data. Hence, alternatives to the AUC have been proposed, such as the partial AUC and the area under the precision-recall curve. However, these alternatives cannot be as fully interpreted as the AUC, in part because they ignore some information about actual negatives. Methods We derive and propose a new concordant partial AUC and a new partial c statistic for ROC data—as foundational measures and methods to help understand and explain parts of the ROC plot and AUC. Our partial measures are continuous and discrete versions of the same measure, are derived from the AUC and c statistic respectively, are validated as equal to each other, and validated as equal in summation to whole measures where expected. Our partial measures are tested for validity on a classic ROC example from Fawcett, a variation thereof, and two real-life benchmark data sets in breast cancer: the Wisconsin and Ljubljana data sets. Interpretation of an example is then provided. Results Results show the expected equalities between our new partial measures and the existing whole measures. The example interpretation illustrates the need for our newly derived partial measures. Conclusions The concordant partial area under the ROC curve was proposed and unlike previous partial measure alternatives, it maintains the characteristics of the AUC. The first partial c statistic for ROC plots was also proposed as an unbiased interpretation for part of an ROC curve. The expected equalities among and between our newly derived partial measures and their existing full measure counterparts are confirmed. These measures may be used with any data set but this paper focuses on imbalanced data with low prevalence. Future work Future work with our proposed measures may: demonstrate their value for imbalanced data with high prevalence, compare them to other measures not based on areas; and combine them with other ROC measures and techniques.

Từ khóa

Tài liệu tham khảo

Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424.

Walter SD. The partial area under the summary ROC curve. Stat Med. 2005;24(13):2025–40. https://doi.org/10.1002/sim.2103.

Obuchowski NA, Bullen JA. Receiver operating characteristic (roc) curves: review of methods with applications in diagnostic medicine. Phys Med Biol. 2018;63(7):07–1.

Fawcett T. An introduction to ROC analysis. Pattern Recogn Lett. 2006;27:861–74. https://doi.org/10.1016/j.patrec.2005.10.010.

Streiner DL, Cairney J. What’s under the roc? An introduction to receiver operating characteristics curves. Can J Psychiatr. 2007;52(2):121–8.

Provost F, Fawcett T. Robust classification for imprecise environments. Mach Learn. 2001.

Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: Seven steps for development and an ABCD for validation. Eur Heart J. 2014;35(29):1925–31. https://doi.org/10.1093/eurheartj/ehu207 arXiv:1011.1669v3.

Austin PC, Steyerberg EW. Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable. BMC Med Res Methodol. 2012;12(1):82.

Steyerberg EW, Kattan MW, Gonen M, Obuchowski N, Pencina MJ, Vickers AJ, Gerds T, Cook NR. Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology. 2009;21(1):128–38. https://doi.org/10.1097/ede.0b013e3181c30fb2.

Pencina MJ, D’Agostino RB Sr, D’Agostino RB Jr, Ramachandran SV. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27:157–72. https://doi.org/10.1002/sim.

Zhou X-H, McClish DK, Obuchowski NA. In: Wiley J, Sons, editors. Statistical Methods in Diagnostic Medicine, vol. 569; 2009. p. 28.

McClish DK. Analyzing a Portion of the ROC Curve. Med Decis Mak. 1989:190–5.

Thompson ML, Zucchini W. On the statistical analysis of ROC curves. Stat Med. 1989;8:1277–90.

Wagstaff K. Machine learning that matters. Arxiv Preprint Arxiv. 2012;1206:4656.

Lobo JM, Jiḿenez-valverde A, Real R. AUC: a misleading measure of the performance of predictive distribution models. Glob Ecol Biogeogr. 2008;17:145–51. https://doi.org/10.1111/j.1466-8238.2007.00358.x.

McNeil BJ, Hanley JA. Statistical approaches to the analysis of receiver operating characteristic (roc) curves. Med Decis Mak. 1984;4(2):137–50.

McClish DK. Evaluation of the accuracy of medical tests in a region around the optimal point. Acad Radiol. 2012;19(12):1484–90. https://doi.org/10.1016/j.acra.2012.09.004.

Jiang Y, Metz CE, Nishikawa RM. A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology. 2014;201(3):745–50. https://doi.org/10.1148/radiology.201.3.8939225.

Tang Y, Zhang Y-Q, Chawla NV, Krasser S. Svms modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern B (Cybernetics). 2009;39(1):281–8.

Yang H, Lu K, Lyu X, Hu F. Two-way partial AUC and its properties. Stat Methods Med Res. 2019;28(1):184–95. https://doi.org/10.1177/0962280217718866.

Bradley AP. Half-AUC for the evaluation of sensitive or specific classifiers. Pattern Recogn Lett. 2014;38:93–8.

Wu T, Huang H, Du G, Sun Y. A novel partial area index of receiver operating characteristic (ROC) curve. Medical Imaging 2008: Image Perception, Observer Performance, and Technology Assessment. 2008;6917(69170):69170. https://doi.org/10.1117/12.769888.

Hu Y-C, Chen C-J. A promethee-based classification method using concordance and discordance relations and its application to bankruptcy prediction. Inf Sci. 2011;181(22):4959–68.

Joerin F, Musy A. Land management with gis and multicriteria analysis. Int Trans Oper Res. 2000;7(1):67–78.

Legendre P. Species associations: the kendall coefficient of concordance revisited. J Agric Biol Environ Stat. 2005;10(2):226.

Mendas A, Delali A. Integration of multicriteria decision analysis in gis to develop land suitability for agriculture: application to durum wheat cultivation in the region of mleta in Algeria. Comput Electron Agric. 2012;83:117–26.

Hilden J. The area under the roc curve and its competitors. Med Decis Mak. 1991;11(2):95–101.

Dodd LE, Pepe MS. Partial AUC estimation and regression. Biometrics. 2003;59(3):614–23. https://doi.org/10.1111/1541-0420.00071.

Pepe MS. The statistical evaluation of medical tests for classification and prediction: Oxford University Press; 2003.

Hanley JA, Hajian-Tilaki KO. Sampling variability of nonparametric estimates of the areas under receiver operating characteristic curves: an update. Acad Radiol. 1997;4(1):49–58.

DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837–45.

Harrell Jr., F.E., Califf, R.M., Prior, D.B., Lee, K. L, Rosati, R.A.: Evaluating the yield of medical tests. J Am Med Assoc 247(18), 2543–2546 (1982). doi:https://doi.org/10.1001/jama.247.18.2543.

Vickers AJ, Cronin AM. Everything you always wanted to know about evaluating prediction models (but were too afraid to ask). Urology. 2010;76(6):1298–301.

Green DM, Swets JA, et al. Signal Detection Theory and Psychophysics, vol. 1: Wiley New York; 1966.

Hosmer DW, Lemeshow S. Applied Logistic Regression; 2000. p. 160–165173180.

Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei L. On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 2011;30(10):1105–17.

Brentnall AR, Cuzick J. Use of the concordance index for predictors of censored survival data. Stat Methods Med Res. 2018;27(8):2359–73.

Steyerberg EW. Clinical prediction models. Springer. 2009.

Michalski RS, Mozetic I, Hong J, Lavrac N. The multi-purpose incremental learning system aq15 and its testing application to three medical domains. Proc AAAI. 1986;1986:1–041.

Wolberg WH, Mangasarian OL. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc Natl Acad Sci. 1990;87(23):9193–6.

Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 1997;30:1145–59.

Bradley, A.P.: The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms. PhD thesis, The University of Queensland.

Metz CE, Kronman HB. Statistical significance tests for binormal roc curves. J Math Psychol. 1980;22(3):218–43.

Ṕerez-Ferńandez, S., Mart́ınez-Camblor, P., Filzmoser, P., Corral, N.: nsroc: An r package for non-standard roc curve analysis. R I Dent J 10 (2), 55–77 (2018).

Ozenne B, Subtil F, Maucort-Boulch D. The precision–recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases. J Clin Epidemiol. 2015;68(8):855–9.

Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One. 2015;10(3):1–21. https://doi.org/10.1371/journal.pone.0118432.

Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA