Finding the Best Classification Threshold in Imbalanced Classification

Big Data Research - Tập 5 - Trang 2-8 - 2016
Quan Zou1,2, Sifa Xie2, Ziyu Lin2, Meihong Wu2, Ying Ju2
1School of Computer Science and Technology, Tianjin University, Tianjin, China
2Department of Computer Science, Xiamen University, Xiamen, China

Tài liệu tham khảo

Yang, 2006, 10 challenging problems in data mining research, Int. J. Inf. Technol. Decis. Mak., 5, 597, 10.1142/S0219622006002258 López, 2013, An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics, Inf. Sci., 250, 113, 10.1016/j.ins.2013.07.007 Tang, 2009, SVMs modeling for highly imbalanced classification, IEEE Trans. Syst. Man Cybern., Part B, Cybern., 39, 281, 10.1109/TSMCB.2008.2002909 Ganganwar, 2012, An overview of classification algorithms for imbalanced datasets, Int. J. Emerging Technol. Adv. Eng., 2, 42 Song, 2014, nDNA-prot: identification of DNA-binding proteins based on unbalanced classification, BMC Bioinform., 15, 298, 10.1186/1471-2105-15-298 Lin, 2013, Hierarchical classification of protein folds using a novel ensemble classifier, PLoS ONE, 8, e56499, 10.1371/journal.pone.0056499 Zeng, 2015, Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks, Brief. Bioinform. Zou, 2015, Similarity computation strategies in the microRNA-disease network: a survey, Brief. Funct. Genomics, 10.1093/bfgp/elv024 Liu, 2014, iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition, PLoS ONE, 9, e106691, 10.1371/journal.pone.0106691 Liu, 2015, Identification of real microRNA precursors with a pseudo structure status composition approach, PLoS ONE, 10, e0121501, 10.1371/journal.pone.0121501 Zeng, 2015, Identification of cytokine via an improved genetic algorithm, Front. Comput. Sci., 9, 643, 10.1007/s11704-014-4089-3 Ezawa, 1996, Learning goal oriented Bayesian networks for telecommunications risk management, 139 Lewis, 1994, Heterogeneous uncertainity sampling for supervised learning, 148 Kwak, 2008, Feature extraction for classification problems and its application to face recognition, Pattern Recognit., 41, 1718, 10.1016/j.patcog.2007.10.012 Tsai, 2009, Forecasting of ozone episode days by cost-sensitive neural network methods, Sci. Total Environ., 407, 2124, 10.1016/j.scitotenv.2008.12.007 Bradley, 1997, The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognit., 30, 1145, 10.1016/S0031-3203(96)00142-2 Yang, 2014, Effective dysphonia detection using feature dimension reduction and kernel density estimation for patients with Parkinson's disease, PLoS ONE, 9, e88825, 10.1371/journal.pone.0088825 Yang, 2014, Representation of fluctuation features in pathological knee joint vibroarthrographic signals using kernel density modeling method, Med. Eng. Phys., 36, 1305, 10.1016/j.medengphy.2014.07.008 Rangayyan, 2013, Fractal analysis of knee-joint vibroarthrographic signals in power spectral analysis, Biomed. Signal Process. Control, 8, 23, 10.1016/j.bspc.2012.05.004 Huang, 2005, Using AUC and accuracy in evaluating learning algorithms, IEEE Trans. Knowl. Data Eng., 17, 299, 10.1109/TKDE.2005.50 Liu, 2009, Exploratory undersampling for class-imbalance learning, IEEE Trans. Syst. Man Cybern., Part B, Cybern., 39, 539, 10.1109/TSMCB.2008.2007853 Cheng, 2011, Recurrent neural network for non-smooth convex optimization problems with application to the identification of genetic regulatory networks, IEEE Trans. Neural Netw., 22, 714, 10.1109/TNN.2011.2109735 Wei, 2014, Improved and promising identification of human MicroRNAs by incorporating a high-quality negative set, IEEE/ACM Trans. Comput. Biol. Bioinform., 11, 192, 10.1109/TCBB.2013.146 Liu, 2015, MiRNA-dis: microRNA precursor identification based on distance structure status pairs, Mol. BioSyst., 11, 1194, 10.1039/C5MB00050E Asa, 2003, Remote homology detection: a motif based approach, Bioinformatics, 19, 26, 10.1093/bioinformatics/btg1002 Liu, 2015, Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy, J. Theor. Biol., 385, 153, 10.1016/j.jtbi.2015.08.025 Liao, 2003, Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships, J. Comput. Biol., 10, 857, 10.1089/106652703322756113 Dong, 2006, Application of latent semantic analysis to protein remote homology detection, Bioinformatics, 22, 285, 10.1093/bioinformatics/bti801 Saigo, 2004, Protein homology detection using string alignment kernels, Bioinformatics, 20, 1682, 10.1093/bioinformatics/bth141 Lingner, 2008, Word correlation matrices for protein sequence analysis and remote homology detection, BMC Bioinform., 9, 259, 10.1186/1471-2105-9-259 Liu, 2014, Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection, Bioinformatics, 30, 472, 10.1093/bioinformatics/btt709 Liu, 2014, Using distances between top-n-gram and residue pairs for protein remote homology detection, BMC Bioinform., 15, S3 Liu, 2015, Protein remote homology detection by combining Chou's distance-pair pseudo amino acid composition and principal component analysis, Mol. Gen. Genet., 290, 1919, 10.1007/s00438-015-1044-4 Liu, 2013, Protein remote homology detection by combining Chou's pseudo amino acid composition and profile-based protein representation, Molecular Inf., 32, 775, 10.1002/minf.201300084 Lingner, 2008, Word correlation matrices for protein sequence analysis and remote homology detection, BMC Bioinform., 9, 259, 10.1186/1471-2105-9-259 Liu, 2015, repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects, Bioinformatics, 31, 1307, 10.1093/bioinformatics/btu820 Liu, 2015, Pse-in-one: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic Acids Res., W65, 10.1093/nar/gkv458 Cai, 2003, SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence, Nucleic Acids Res., 31, 3692, 10.1093/nar/gkg600 Zou, 2013, BinMemPredict: a Web server and software for predicting membrane protein types, Current Proteomics, 10, 2, 10.2174/1570164611310010002 Liu, 2015, Implementation of arithmetic operations with time-free spiking neural P systems, IEEE Trans. NanoBiosci., 14, 617, 10.1109/TNB.2015.2438257 Song, 2015, Asynchronous spiking neural P systems with anti-spikes, Neural Proces. Lett. Song, 2015, Asynchronous spiking neural P systems with rules on synapses, Neurocomputing, 152, 1439, 10.1016/j.neucom.2014.10.044 Zeng, 2014, Spiking neural P systems with thresholds, Neural Comput., 26, 1340, 10.1162/NECO_a_00605 Zou, 2014, Survey of MapReduce frame operation in bioinformatics, Brief. Bioinform., 15, 637, 10.1093/bib/bbs088 Zou, 2015, HAlign: fast multiple similar DNA/RNA sequence alignment based on the centre star strategy, Bioinformatics, 31, 2475, 10.1093/bioinformatics/btv177