pROC: an open-source package for R and S+ to analyze and compare ROC curves
Tóm tắt
Receiver operating characteristic (ROC) curves are useful tools to evaluate classifiers in biomedical and bioinformatics applications. However, conclusions are often reached through inconsistent use or insufficient statistical analysis. To support researchers in their ROC curves analysis we developed pROC, a package for R and S+ that contains a set of tools displaying, analyzing, smoothing and comparing ROC curves in a user-friendly, object-oriented and flexible interface. With data previously imported into the R or S+ environment, the pROC package builds ROC curves and includes functions for computing confidence intervals, statistical tests for comparing total or partial area under the curve or the operating points of different classifiers, and methods for smoothing ROC curves. Intermediary and final results are visualised in user-friendly interfaces. A case study based on published clinical and biomarker data shows how to perform a typical ROC analysis with pROC. pROC is a package for R and S+ specifically dedicated to ROC analysis. It proposes multiple statistical tests to compare ROC curves, and in particular partial areas under the curve, allowing proper ROC interpretation. pROC is available in two versions: in the R programming language or with a graphical user interface in the S+ statistical software. It is accessible at
http://expasy.org/tools/pROC/
under the GNU General Public License. It is also distributed through the CRAN and CSAN public repositories, facilitating its installation.
Tài liệu tham khảo
Swets JA: The Relative Operating Characteristic in Psychology. Science 1973, 182: 990–1000. 10.1126/science.182.4116.990
Pepe MS: The statistical evaluation of medical tests for classification and prediction. Oxford: Oxford University Press; 2003.
Sonego P, Kocsor A, Pongor S: ROC analysis: applications to the classification of biological sequences and 3D structures. Brief Bioinform 2008, 9: 198–209. 10.1093/bib/bbm064
Fawcett T: An introduction to ROC analysis. Pattern Recogn Lett 2006, 27: 861–874. 10.1016/j.patrec.2005.10.010
Hanczar B, Hua J, Sima C, Weinstein J, Bittner M, Dougherty ER: Small-sample precision of ROC-related estimates. Bioinformatics 2010, 26: 822–830. 10.1093/bioinformatics/btq037
Robin X, Turck N, Hainard A, Lisacek F, Sanchez JC, Müller M: Bioinformatics for protein biomarker panel classification: What is needed to bring biomarker panels into in vitro diagnostics? Expert Rev Proteomics 2009, 6: 675–689. 10.1586/epr.09.83
McClish DK: Analyzing a Portion of the ROC Curve. Med Decis Making 1989, 9: 190–195. 10.1177/0272989X8900900307
Jiang Y, Metz CE, Nishikawa RM: A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology 1996, 201: 745–750.
Streiner DL, Cairney J: What's under the ROC? An introduction to receiver operating characteristics curves. Canadian Journal of Psychiatry Revue Canadienne De Psychiatrie 2007, 52: 121–128.
Stephan C, Wesseling S, Schink T, Jung K: Comparison of Eight Computer Programs for Receiver-Operating Characteristic Analysis. Clin Chem 2003, 49: 433–439. 10.1373/49.3.433
R Development Core Team: R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2010.
Sing T, Sander O, Beerenwinkel N, Lengauer T: ROCR: visualizing classifier performance in R. Bioinformatics 2005, 21: 3940–3941. 10.1093/bioinformatics/bti623
NCAR: verification: Forecast verification utilities v. 1.31.[http://CRAN.R-project.org/package=verification]
Carey V, Redestig H: ROC: utilities for ROC, with uarray focus, v. 1.24.0.[http://www.bioconductor.org]
Pepe M, Longton G, Janes H: Estimation and Comparison of Receiver Operating Characteristic Curves. The Stata journal 2009, 9: 1.
Hanley JA, McNeil BJ: A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983, 148: 839–843.
DeLong ER, DeLong DM, Clarke-Pearson DL: Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics 1988, 44: 837–845. 10.2307/2531595
Bandos AI, Rockette HE, Gur D: A permutation test sensitive to differences in areas for comparing ROC curves from a paired design. Stat Med 2005, 24: 2873–2893. 10.1002/sim.2149
Braun TM, Alonzo TA: A modified sign test for comparing paired ROC curves. Biostat 2008, 9: 364–372. 10.1093/biostatistics/kxm036
Venkatraman ES, Begg CB: A distribution-free procedure for comparing receiver operating characteristic curves from a paired experiment. Biometrika 1996, 83: 835–848. 10.1093/biomet/83.4.835
Bandos AI, Rockette HE, Gur D: A Permutation Test for Comparing ROC Curves in Multireader Studies: A Multi-reader ROC, Permutation Test. Acad Radiol 2006, 13: 414–420. 10.1016/j.acra.2005.12.012
Moise A, Clement B, Raissis M: A test for crossing receiver operating characteristic (roc) curves. Communications in Statistics - Theory and Methods 1988, 17: 1985–2003. 10.1080/03610928808829727
Venkatraman ES: A Permutation Test to Compare Receiver Operating Characteristic Curves. Biometrics 2000, 56: 1134–1138. 10.1111/j.0006-341X.2000.01134.x
Campbell G: Advances in statistical methodology for the evaluation of diagnostic and laboratory tests. Stat Med 1994, 13: 499–508. 10.1002/sim.4780130513
Wickham H: plyr: Tools for splitting, applying and combining data v. 1.4.[http://CRAN.R-project.org/package=plyr]
Carpenter J, Bithell J: Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat Med 2000, 19: 1141–1164. 10.1002/(SICI)1097-0258(20000515)19:9<1141::AID-SIM479>3.0.CO;2-F
Metz CE, Herman BA, Shen JH: Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat Med 1998, 17: 1033–1053. 10.1002/(SICI)1097-0258(19980515)17:9<1033::AID-SIM784>3.0.CO;2-Z
Hanley JA: The robustness of the "binormal" assumptions used in fitting ROC curves. Med Decis Making 1988, 8: 197–203. 10.1177/0272989X8800800308
Zou KH, Hall WJ, Shapiro DE: Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Stat Med 1997, 16: 2143–2156. 10.1002/(SICI)1097-0258(19971015)16:19<2143::AID-SIM655>3.0.CO;2-3
Venables WN, Ripley BD: Modern Applied Statistics with S. Fourth edition. New York: Springer; 2002.
Turck N, Vutskits L, Sanchez-Pena P, Robin X, Hainard A, Gex-Fabry M, Fouda C, Bassem H, Mueller M, Lisacek F, et al.: A multiparameter panel method for outcome prediction following aneurysmal subarachnoid hemorrhage. Intensive Care Med 2010, 36: 107–115. 10.1007/s00134-009-1641-y
Ewens WJ, Grant GR: Statistics (i): An Introduction to Statistical Inference. In Statistical methods in bioinformatics. New York: Springer-Verlag; 2005.