Iterative ensemble feature selection for multiclass classification of imbalanced microarray data

Jun Yang1, Jian Zhou2, Zexuan Zhu3, Xiaoliang Ma1, Zhen Ji1
1College of Engineering and Information, Shenzhen University, Shenzhen, People's Republic of China
2School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, People's Republic of China
3College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, People’s Republic of China

Tóm tắt

Từ khóa


Tài liệu tham khảo

Fehrmann RS, Karjalainen JM, Krajewska M, Westra HJ, Maloney D, Simeonov A, et al. Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat Genet. 2015;47:115–25.

Gerstung M, Pellagatti A, Malcovati L, Giagounidis A, Della Porta MG, Jädersten M, et al. Combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes. Nat Commun. 2015;6:5901.

Chambers AH, Pillet J, Plotto A, Bai J, Whitaker VM, Folta KM. Identification of a strawberry flavour gene candidate using an integrated genetic-genomic-analytical chemistry approach. BMC Genomics. 2014;15:217.

Hausser J, Zavolan M. Identification and consequences of miRNA-target interactions—beyond repression of gene expression. Nat Rev Genet. 2014;15:599–612.

Madahian B, Deng LY, Homayouni R. Development of sparse Bayesian multinomial generalized linear model for multi-class prediction. BMC Bioinformatics. 2014;15:S10.

Engchuan W, Chan JH. Pathway activity transformation for multi-class classification of lung cancer datasets. Neurocomputing. 2015;165:81–9.

Zhou X, Tuck DP. MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics. 2007;23:1106–14.

Rajapakse JC, Mundra PA. Multiclass gene selection using Pareto-fronts. IEEE/ACM Trans Comput Biol Bioinform. 2013;10:87–97.

Li T, Zhang C, Ogihara M. A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics. 2004;20:2429–37.

Cao KAL, Bonnet A, Gadat S. Multiclass classification and gene selection with a stochastic algorithm. Comput Stat Data Anal. 2009;53:3601–15.

Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46:389–422.

Yeung K, Bumgarner RA, Raftery AE. Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics. 2005;21:2394–402.

Fürnkranz J. Round robin classification. J Mach Learn Res. 2002;2:721–47.

Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci USA. 2001;98:15149–54.

Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA. 2001;98:13790–5.

Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell. 2002;1:133–43.

Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–7.

Forman G. A pitfall and solution in multi-class feature selection for text classification. Proc Twenty-first Int Conf Mach Learn. 2004;6441:38.

He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2009;21:1263–84.

Liu X-Y, Wu J, Zhou Z-H. Exploratory undersampling for class-imbalance learning. In: IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol 39. IEEE; 2009. p. 539–50.

Yukinawa N, Oba S, Kato K, Taniguchi K, Iwao-Koizumi K, Tamaki Y, et al. A multi-class predictor based on a probabilistic model: application to gene expression profiling-based diagnosis of thyroid tumors. BMC Genomics. 2006;7:190.

Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, et al. A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform. 2012;9:1106–19.

Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2002;3:1157–82.

Yu L, Liu H. Feature selection for high-dimensional data: a fast correlation-based filter solution. Proc Eight Int Conf Mach Learn. 2003;2:856–63.

Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 27. IEEE; 2005. p. 1226–38.

Japkowicz N. The class imbalance problem: significance and strategies. In Proceedings of the international conference on artificial intelligence. 2002;111–117.

Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.

Akbani R, Kwek S, Japkowicz N. Applying support vector machines to imbalanced datasets. Mach Learn. 2004;3201:39–50.

Liu W, Chawla S. Class confidence weighted kNN algorithms for imbalanced data sets. Adv Knowl Discov Data Min. 2011;6635:345–56.

Chawla NV, Japkowicz N, Kotcz A. Editorial: special issue on learning from imbalanced data sets. ACM Sigkdd Explor Newsl. 2004;6:1–6.

Japkowicz N. Learning from imbalanced data sets: a comparison of various strategies. In: AAAI workshop on learning from imbalanced data sets, vol. 68; 2000. p. 10–15.

Do KA, Ambroise C. Analyzing microarray gene expression data, vol. 14. New York: Wiley; 2004. p. 1080–7.

Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The weka data mining software: an update. ACM Sigkdd Explor Newsl. 2009;11:10–8.

Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical Recipes in C. Cambridge University Press, vol. 10; 1992. p. 195–196.

Gutlein M, Frank E, Hall M, Karwath A. Large-scale attribute selection using wrappers. In: IEEE Symposium on Computational Intelligence and Data Mining. 2009. p. 332–339.