The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets
Tóm tắt
Từ khóa
Tài liệu tham khảo
AL Tarca, 2007, Machine learning and its applications to biology, PLoS Comput Biol, 3, e116, 10.1371/journal.pcbi.0030116
A Ben-Hur, 2008, Support vector machines and kernels for computational biology, PLoS Comput Biol, 4, e1000173, 10.1371/journal.pcbi.1000173
JA Hanley, 1982, The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 143, 29, 10.1148/radiology.143.1.7063747
H He, 2009, Learning from Imbalanced Data, IEEE Trans Knowl Data Eng, 21, 1263, 10.1109/TKDE.2008.239
N Chawla, 2004, Editorial: Special Issue on Learning from Imbalanced Data Sets, SIGKDD Explor, 6
NV Chawla, 2002, SMOTE: synthetic minority over-sampling technique, J Artif Intell Res, 16, 321, 10.1613/jair.953
M Kubat, 1998, Machine Learning for the Detection of Oil Spills in Satellite Radar Images, Mach Learn, 30, 195, 10.1023/A:1007452223027
Provost F. Machine learning from imbalanced data sets 101. Proceedings of the AAAI-2000 Workshop on Imbalanced Data Sets. 2000.
JV Hulse, 2007, Experimental perspectives on learning from imbalanced data. Proceedings of the 24th international conference on, Machine learning, 935
H Guo, 2004, Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach, SIGKDD Explor, 6, 30, 10.1145/1007730.1007736
M Kubat, 1997, Addressing the curse of imbalanced training sets: one-sided selection, In Proceedings of the Fourteenth International Conference on Machine Learning, 179
C Ling, 1998, Data Mining for Direct Marketing: Problems and Solutions, In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, 73
C Elkan, 2001, The foundations of cost-sensitive learning, Proceedings of the 17th international joint conference on Artificial intelligence, Volume 2, 973
Y Sun, 2007, Cost-sensitive boosting for classification of imbalanced data, Pattern Recognit, 40, 3358, 10.1016/j.patcog.2007.04.009
N Japkowicz, 2002, The class imbalance problem: A systematic study, Intell Data Anal, 6, 429, 10.3233/IDA-2002-6504
X Hong, 2007, A kernel-based two-class classifier for imbalanced data sets, IEEE Trans Neural Netw, 18, 28, 10.1109/TNN.2006.882812
Wu G, Chang E. Class-Boundary Alignment for Imbalanced Dataset Learning. Workshop on Learning from Imbalanced Datasets in ICML. 2003.
A Estabrooks, 2004, A Multiple Resampling Method for Learning from Imbalanced Data Sets, Comput Intell, 20, 18, 10.1111/j.0824-7935.2004.t01-1-00228.x
A Ben-Hur, 2010, A user's guide to support vector machines, Methods Mol Biol, 609, 223, 10.1007/978-1-60327-241-4_13
B Mac Namee, 2002, The problem of bias in training data in regression problems in medical decision support, Artif Intell Med, 24, 51, 10.1016/S0933-3657(01)00092-6
K Soreide, 2009, Receiver-operating characteristic curve analysis in diagnostic, prognostic and predictive biomarker research, J Clin Pathol, 62, 1, 10.1136/jcp.2008.061010
T Fawcett, 2006, An introduction to ROC analysis, Pattern Recognit Lett, 27, 861, 10.1016/j.patrec.2005.10.010
JA Swets, 1988, Measuring the accuracy of diagnostic systems, Science, 240, 1285, 10.1126/science.3287615
J Davis, 2006, The relationship between Precision-Recall and ROC curves, Proceedings of the 23rd international conference on Machine learning, 233, 10.1145/1143844.1143874
SJ Swamidass, 2010, A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval, Bioinformatics, 26, 1348, 10.1093/bioinformatics/btq140
C Drummond, 2000, Explicitly Representing Expected Cost: An Alternative to ROC Representation, In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 198, 10.1145/347090.347126
D Berrar, 2012, Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them), Brief Bioinform, 13, 83, 10.1093/bib/bbr008
TH Huang, 2007, MiRFinder: an improved approach and software implementation for genome-wide fast microRNA precursor scans, BMC Bioinformatics, 8, 341, 10.1186/1471-2105-8-341
DG Altman, 1994, Diagnostic tests. 1: Sensitivity and specificity, BMJ, 308, 1552, 10.1136/bmj.308.6943.1552
P Baldi, 2000, Assessing the accuracy of prediction algorithms for classification: an overview, Bioinformatics, 16, 412, 10.1093/bioinformatics/16.5.412
C Goutte, 2005, A probabilistic interpretation of precision, recall and F-score, with implication for evaluation, Advances in Information Retrieval, 345, 10.1007/978-3-540-31865-1_25
M Hall, 2009, The WEKA data mining software: an update, SIGKDD Explor, 11, 10, 10.1145/1656274.1656278
C-C Chang, 2011, LIBSVM: A library for support vector machines, ACM Trans Intell Syst Technol, 2, 1, 10.1145/1961189.1961199
J Hilden, 1991, The area under the ROC curve and its competitors, Med Decis Making, 11, 95, 10.1177/0272989X9101100204
JF Truchon, 2007, Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem, J Chem Inf Model, 47, 488, 10.1021/ci600426e
M Gribskov, 1996, Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching, Comput Chem, 20, 25, 10.1016/S0097-8485(96)80004-0
Macskassy S, Provost F. Confidence bands for ROC curves: Methods and an empirical study. Proceedings of the First Workshop on ROC Analysis in AI. 2004.
T Sing, 2005, ROCR: visualizing classifier performance in R, Bioinformatics, 21, 3940, 10.1093/bioinformatics/bti623
R Ihaka, 1996, R: A Language for Data Analysis and Graphics, J Comput Graph Stat, 5, 299, 10.1080/10618600.1996.10474713
RC Gentleman, 2004, Bioconductor: open software development for computational biology and bioinformatics, Genome Biol, 5, R80, 10.1186/gb-2004-5-10-r80
PE Meyer, 2008, minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information, BMC Bioinformatics, 9, 461, 10.1186/1471-2105-9-461
JN Hirschhorn, 2005, Genome-wide association studies for common diseases and complex traits, Nat Rev Genet, 6, 95, 10.1038/nrg1521
AR Gruber, 2010, RNAz 2.0: improved noncoding RNA detection, Pac Symp Biocomput, 69
A Kozomara, 2011, miRBase: integrating microRNA annotation and deep-sequencing data, Nucleic Acids Res, 39, D152, 10.1093/nar/gkq1027
P Jiang, 2007, MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features, Nucleic Acids Res, 35, W339, 10.1093/nar/gkm368
J Hertel, 2006, Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data, Bioinformatics, 22, e197, 10.1093/bioinformatics/btl257
JW Nam, 2005, Human microRNA prediction through a probabilistic co-learning model of sequence and structure, Nucleic Acids Res, 33, 3570, 10.1093/nar/gki668
I Hofacker, 1994, Fast Folding and Comparison of RNA Secondary Structures, Monatsh Chem, 125, 167, 10.1007/BF00818163
B Boser, 1992, A training algorithm for optimal margin classifiers, Proceedings of the fifth annual workshop on Computational learning theory, 144, 10.1145/130385.130401
SJ Raudys, 1991, Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners, IEEE Trans Pattern Anal Mach Intell, 13, 252, 10.1109/34.75512
DP Bartel, 2004, MicroRNAs: genomics, biogenesis, mechanism, and function, Cell, 116, 281, 10.1016/S0092-8674(04)00045-5