Improved gene expression diagnosis via cascade entropy-fisher score and ensemble classifiers
Multimedia Tools and Applications - Trang 1-20
Tóm tắt
Feature selection is an important technique used in bioinformatics modeling to reduce the dimensionality of high-dimensional data. However, filter-based approaches that have shown better performance often depend on specific measurement methods, which can limit their effectiveness. To address this problem, this paper proposes a novel cascade feature selection approach, named the cascade entropy-fisher score (CEFS), that combines entropy score (ES)-based and Fisher score (FS)-based feature selection. CEFS involves a two-step process where in the first step, the entropy of each gene in the dataset is calculated to measure the uncertainty associated with its expression levels across different samples. In the second step, the Fisher score is computed to measure the extent to which the gene's expression levels differ between classes of samples. CEFS has been shown to outperform other methods in identifying disease-specific genes in gene expression datasets, making it a promising tool for disease diagnosis and prognosis. The proposed method was evaluated on biomedical datasets, and its effectiveness was measured in terms of accuracy, sensitivity, specificity, and area under the curve (AUC). The results showed that CEFS has comparable performance to state-of-the-art feature selection methods in the literature. Additionally, the selected features were fed to an ensemble of three classifiers, including support vector machine (SVM), k-nearest neighbor (k-NN), and decision tree (DT), to evaluate performance in the classification stage. The ensemble approach is based on majority voting, which aggregates the outputs of the individual classifiers to determine the final label. The results demonstrate the potential of CEFS in machine learning applications, particularly in the context of disease diagnosis and prognosis.
Tài liệu tham khảo
Rahman MM (2018) Gene editing: a molecular miracle
Koul N, Manvi SS (2022) Feature selection from gene expression data using simulated annealing and partial least squares regression coefficients. Glob Transitions Proc
citation_journal_title=Res J Pharm Technol; citation_title=A survey on feature selection methods in microarray gene expression data for cancer classification; citation_author=C Gunavathi, K Premalatha, K Sivasubramanian; citation_volume=10; citation_issue=5; citation_publication_date=2017; citation_pages=1395-1401; citation_doi=10.5958/0974-360X.2017.00249.9; citation_id=CR3
Källberg D, Vidman L, Rydén P (2021) Comparison of methods for feature selection in clustering of high-dimensional RNA-sequencing data to identify cancer subtypes. Front Genet 12
citation_journal_title=Knowledge-Based Syst; citation_title=Multi-objective feature selection based on quasi-oppositional based Jaya algorithm for microarray data; citation_author=A Chaudhuri, TP Sahu; citation_volume=236; citation_publication_date=2022; citation_doi=10.1016/j.knosys.2021.107804; citation_id=CR5
citation_journal_title=Adv Bioinformatics; citation_title=A review of feature selection and feature extraction methods applied on microarray data; citation_author=ZM Hira, DF Gillies; citation_volume=2015; citation_publication_date=2015; citation_pages=1-13; citation_doi=10.1155/2015/198363; citation_id=CR6
citation_journal_title=IEEE/ACM Trans Comput Biol Bioinforma; citation_title=Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection; citation_author=JC Ang, A Mirzal, H Haron, HNA Hamed; citation_volume=13; citation_issue=5; citation_publication_date=2016; citation_pages=971-989; citation_doi=10.1109/TCBB.2015.2478454; citation_id=CR7
citation_journal_title=Appl Soft Comput J; citation_title=Classification of DNA microarrays using artificial neural networks and ABC algorithm; citation_author=BA Garro, K Rodríguez, RA Vázquez; citation_volume=38; citation_publication_date=2016; citation_pages=548-560; citation_doi=10.1016/j.asoc.2015.10.002; citation_id=CR8
citation_journal_title=Appl Soft Comput J; citation_title=A binary ABC algorithm based on advanced similarity scheme for feature selection; citation_author=E Hancer, B Xue, D Karaboga, M Zhang; citation_volume=36; citation_publication_date=2015; citation_pages=334-348; citation_doi=10.1016/j.asoc.2015.07.023; citation_id=CR9
citation_journal_title=Expert Syst Appl; citation_title=Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique; citation_author=S Kar, K Das Sharma, M Maitra; citation_volume=42; citation_issue=1; citation_publication_date=2015; citation_pages=612-627; citation_doi=10.1016/j.eswa.2014.08.014; citation_id=CR10
citation_journal_title=Genomics; citation_title=Applying genetic programming to the prediction of alternative mRNA splice variants; citation_author=I Vukusic, SN Grellscheid, T Wiehe; citation_volume=89; citation_issue=4; citation_publication_date=2007; citation_pages=471-479; citation_doi=10.1016/j.ygeno.2007.01.001; citation_id=CR11
citation_journal_title=Genomics; citation_title=Predicting human microRNA precursors based on an optimized feature subset generated by GA-SVM; citation_author=Y Wang; citation_volume=98; citation_issue=2; citation_publication_date=2011; citation_pages=73-78; citation_doi=10.1016/j.ygeno.2011.04.011; citation_id=CR12
citation_journal_title=Genomics Proteomics Bioinforma; citation_title=ADSRPCL-SVM Approach to informative gene analysis; citation_author=W Xiong, Z Cai, J Ma; citation_volume=6; citation_issue=2; citation_publication_date=2008; citation_pages=83-90; citation_doi=10.1016/S1672-0229(08)60023-6; citation_id=CR13
citation_journal_title=Genomics Proteomics Bioinforma; citation_title=A modified ant colony optimization algorithm for tumor marker gene selection; citation_author=H Yu, G Gu, H Liu, J Shen, J Zhao; citation_volume=7; citation_issue=4; citation_publication_date=2009; citation_pages=200-208; citation_doi=10.1016/S1672-0229(08)60050-9; citation_id=CR14
citation_journal_title=Comput Biol Med; citation_title=Wrapper-based gene selection with Markov blanket; citation_author=A Wang, N An, J Yang, G Chen, L Li, G Alterovitz; citation_volume=81; citation_issue=December 2016; citation_publication_date=2017; citation_pages=11-23; citation_doi=10.1016/j.compbiomed.2016.12.002; citation_id=CR15
citation_journal_title=Int J Adv Sci Eng Inf Technol; citation_title=Improved support vector machine using multiple SVM-RFE for cancer classification; citation_author=NNM Hasri, NH Wen, CW Howe, MS Mohamad, S Deris, S Kasim; citation_volume=7; citation_issue=4–2 Special Issue; citation_publication_date=2017; citation_pages=1589-1594; citation_doi=10.18517/ijaseit.7.4-2.3394; citation_id=CR16
citation_journal_title=Sci Rep; citation_title=A hybrid gene selection method based on ReliefF and ant colony optimization algorithm for tumor classification; citation_author=L Sun, X Kong, J Xu, Z Xue, R Zhai, S Zhang; citation_volume=9; citation_issue=1; citation_publication_date=2019; citation_pages=1-14; citation_id=CR17
citation_journal_title=Knowledge-Based Syst; citation_title=A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data; citation_author=H Wang, X Jing, B Niu; citation_volume=126; citation_publication_date=2017; citation_pages=8-19; citation_doi=10.1016/j.knosys.2017.04.004; citation_id=CR18
citation_journal_title=Comput Biol Med; citation_title=A hybrid feature selection method for DNA microarray data; citation_author=L-Y Chuang, C-H Yang, K-C Wu, C-H Yang; citation_volume=41; citation_issue=4; citation_publication_date=2011; citation_pages=228-237; citation_doi=10.1016/j.compbiomed.2011.02.004; citation_id=CR19
citation_journal_title=Knowl Inf Syst; citation_title=A two-stage gene selection scheme utilizing MRMR filter and GA wrapper; citation_author=A Akadi, A Amine, A Ouardighi, D Aboutajdine; citation_volume=26; citation_issue=3; citation_publication_date=2011; citation_pages=487-500; citation_doi=10.1007/s10115-010-0288-x; citation_id=CR20
citation_journal_title=Appl Soft Comput J; citation_title=A novel hybrid feature selection method for microarray data analysis; citation_author=CP Lee, Y Leu; citation_volume=11; citation_issue=1; citation_publication_date=2011; citation_pages=208-213; citation_doi=10.1016/j.asoc.2009.11.010; citation_id=CR21
citation_journal_title=Eng Appl Artif Intell; citation_title=Gene selection for cancer tumor detection using a novel memetic algorithm with a multi-view fitness function; citation_author=A Zibakhsh, MS Abadeh; citation_volume=26; citation_issue=4; citation_publication_date=2013; citation_pages=1274-1281; citation_doi=10.1016/j.engappai.2012.12.009; citation_id=CR22
citation_journal_title=Med Biol Eng Comput; citation_title=Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods; citation_author=M Ghosh, S Adhikary, KK Ghosh, A Sardar, S Begum, R Sarkar; citation_volume=57; citation_issue=1; citation_publication_date=2019; citation_pages=159-176; citation_doi=10.1007/s11517-018-1874-4; citation_id=CR23
citation_journal_title=Expert Syst Appl; citation_title=Recursive memetic algorithm for gene selection in microarray data; citation_author=M Ghosh, S Begum, R Sarkar, D Chakraborty, U Maulik; citation_volume=116; citation_publication_date=2019; citation_pages=172-185; citation_doi=10.1016/j.eswa.2018.06.057; citation_id=CR24
citation_journal_title=Genomics; citation_title=Ranking analysis of microarray data: a powerful method for identifying differentially expressed genes; citation_author=Y Tan, M Fornage, YX Fu; citation_volume=88; citation_issue=6; citation_publication_date=2006; citation_pages=846-854; citation_doi=10.1016/j.ygeno.2006.08.003; citation_id=CR25
citation_journal_title=Genomics; citation_title=Ranking analysis for identifying differentially expressed genes; citation_author=Y Qi, H Sun, Q Sun, L Pan; citation_volume=97; citation_issue=5; citation_publication_date=2011; citation_pages=326-329; citation_doi=10.1016/j.ygeno.2011.03.002; citation_id=CR26
Xu J, Xu T, Sun L, Ren J (2013) An improved correlation measure-based SOM clustering algorithm for gene selection. J Softw 8(12)
Bennet J, Arul Ganaprakasam C, Arputharaj K (2014) A discrete wavelet based feature extraction and hybrid classification technique for microarray data analysis. Sci World J 2014
citation_journal_title=Genomics; citation_title=Robust and stable gene selection via Maximum-Minimum Correntropy Criterion; citation_author=M Mohammadi, H Sharifi Noghabi, G Abed Hodtani, H Rajabi Mashhadi; citation_volume=107; citation_issue=2–3; citation_publication_date=2016; citation_pages=83-87; citation_doi=10.1016/j.ygeno.2015.12.006; citation_id=CR29
citation_journal_title=Chemom Intell Lab Syst; citation_title=Feature selection and classification for gene expression data using novel correlation based overlapping score method via Chou’s 5-steps rule; citation_author=A Wahid; citation_volume=199; citation_publication_date=2020; citation_doi=10.1016/j.chemolab.2020.103958; citation_id=CR30
citation_journal_title=BMC Bioinformatics; citation_title=Adaptive filtering of microarray gene expression data based on Gaussian mixture decomposition; citation_author=M Marczyk, R Jaksik, A Polanski, J Polanska; citation_volume=14; citation_issue=1; citation_publication_date=2013; citation_pages=101; citation_doi=10.1186/1471-2105-14-101; citation_id=CR31
citation_journal_title=Genomics; citation_title=Receiver operating characteristic analysis: a general tool for DNA array data filtration and performance estimation; citation_author=NN Khodarev; citation_volume=81; citation_issue=2; citation_publication_date=2003; citation_pages=202-209; citation_doi=10.1016/S0888-7543(02)00042-3; citation_id=CR32
citation_journal_title=Bioinformatics; citation_title=I/NI-calls for the exclusion of non-informative genes: a highly effective filtering tool for microarray data; citation_author=W Talloen; citation_volume=23; citation_issue=21; citation_publication_date=2007; citation_pages=2897-2902; citation_doi=10.1093/bioinformatics/btm478; citation_id=CR33
citation_journal_title=Genes (Basel); citation_title=The cross-entropy based multi-filter ensemble method for gene selection; citation_author=Y Sun, C Lu, X Li; citation_volume=9; citation_issue=5; citation_publication_date=2018; citation_pages=258; citation_doi=10.3390/genes9050258; citation_id=CR34
Zhang H (2021) Feature selection using approximate conditional entropy based on fuzzy information granule for gene expression data classification. Front Genet 12
citation_journal_title=Neurocomputing; citation_title=An efficient gene selection algorithm based on mutual information; citation_author=R Cai, Z Hao, X Yang, W Wen; citation_volume=72; citation_issue=4–6; citation_publication_date=2009; citation_pages=991-999; citation_doi=10.1016/j.neucom.2008.04.005; citation_id=CR36
citation_journal_title=Genomics Proteomics Bioinforma; citation_title=Gene expression data classification using consensus independent component analysis; citation_author=CH Zheng, DS Huang, XZ Kong, XM Zhao; citation_volume=6; citation_issue=2; citation_publication_date=2008; citation_pages=74-82; citation_doi=10.1016/S1672-0229(08)60022-4; citation_id=CR37
citation_journal_title=Genomics Proteomics Bioinforma; citation_title=A modified t-test feature selection method and its application on the hapmap genotype data; citation_author=N Zhou, L Wang; citation_volume=5; citation_issue=3–4; citation_publication_date=2007; citation_pages=242-249; citation_doi=10.1016/S1672-0229(08)60011-X; citation_id=CR38
citation_journal_title=Genomics Proteomics Bioinforma; citation_title=Fuzzy logic for elimination of redundant information of microarray data; citation_author=EB Huerta, B Duval, JK Hao; citation_volume=6; citation_issue=2; citation_publication_date=2008; citation_pages=61-73; citation_doi=10.1016/S1672-0229(08)60021-2; citation_id=CR39
citation_journal_title=IEEE/ACM Trans Comput Biol Bioinforma; citation_title=A survey on filter techniques for feature selection in gene expression microarray analysis; citation_author=C Lazar; citation_volume=9; citation_issue=4; citation_publication_date=2012; citation_pages=1106-1119; citation_doi=10.1109/TCBB.2012.33; citation_id=CR40
citation_journal_title=Math Biosci; citation_title=Identification of potential biomarkers on microarray data using distributed gene selection approach; citation_author=AK Shukla, D Tripathi; citation_volume=315; citation_issue=June; citation_publication_date=2019; citation_pages=108230; citation_doi=10.1016/j.mbs.2019.108230; citation_id=CR41
citation_journal_title=ETRI J; citation_title=An enhanced feature selection filter for classification of microarray cancer data; citation_author=DH Mazumder, R Veilumuthu; citation_volume=41; citation_issue=3; citation_publication_date=2019; citation_pages=358-370; citation_doi=10.4218/etrij.2018-0522; citation_id=CR42
citation_journal_title=Comput Intell; citation_title=A novel dissimilarity metric based on feature-to-feature scatter frequencies for clustering-based feature selection in biomedical data; citation_author=G Sheikhi, H Altınçay; citation_volume=37; citation_issue=4; citation_publication_date=2021; citation_pages=1865-1889; citation_doi=10.1111/coin.12470; citation_id=CR43
citation_journal_title=Pattern Recognit; citation_title=Markov blanket-embedded genetic algorithm for gene selection; citation_author=Z Zhu, YS Ong, M Dash; citation_volume=40; citation_issue=11; citation_publication_date=2007; citation_pages=3236-3248; citation_doi=10.1016/j.patcog.2007.02.007; citation_id=CR44