MRMR-SSA: a hybrid approach for optimal feature selection
Tóm tắt
A critical issue in data mining and machine learning is feature selection. The crucial part is how to specify the eminent problem-relevant features out of a collection of features contained in a dataset. Feature selection process goes with the pre processing steps in knowledge revelation (KDD process). It aids in eliminating the unnecessary (redundant) and unrelated (irrelevant) features in order to improve the fulfillment of classifying algorithms. It chooses the most optimal count of features that is best suited to classification model which in turn advance the learning process. As such, the correctness (accuracy) of classification increases. Thus, in this paper we have proposed a two-staged hybrid arrangement of model that contains filter-based approach in the first stage to filter out the unnecessary and unrelated features and then providing these acquired features as input to the next stage that is the wrapper method by availing the recent swarm based algorithm, namely, salp swarm algorithm or SSA. The proposed model is named as MRMR-SSA. The binary version of SSA is utilized to evaluate the features that can either take the feature as 1 or discard it as 0. Specific classifiers like XGBoost, AdaBoost, Random forests and Logistic Regression are made in use in this paper. Accuracy is considered to measure the performance of each classifier. An analogy is made for the proposed hybrid feature selection approach with a few familiar algorithms specifically MRMR-PSO, MRMR-GA, MRMR-ALO and MRMR-ACO. The proposed hybrid approach leaves behind other given hybrid methods.
Tài liệu tham khảo
Aremu OO, Hyland-Wood D, McAree PR (2019) A machine learning approach to circumventing the curse of dimensionality in discontinuous time series machine data. Reliab Eng Syst Saf 195:106706
Gunay ME, Yildirim R (2020) Recent advances in knowledge discovery for heterogeneous catalysis using machine learning. Catal Rev 63:1–45
Anukrishna PR, Paul V (2017) A review on feature selection for high dimensional data. Int Conf Invent Syst Control (ICISC) 2017:1–4
Moorthy RS, Pabitha P (2018) A study on meta heuristic algorithms for feature selection. In: International conference on intelligent data communication technologies and internet of things, pp 1291–1298
Nizami IF, Majid M, Khurshid K (2018) New feature selection algorithms for no-reference image quality assessment. Appl Intell 48(10):3482–3501
Wang Y, Li T (2020) Local feature selection based on artificial immune system for classification. Appl Soft Comput 87:105989
Wang X-H, Zhang Y, Sun X-Y, Wang Y-I, Du C-H (2019) Multi-objective feature selection based on artificial bee colony: an acceleration approach with variable sample size. Appl Soft Comput 88:106041
Alazzam H, Shariekh A, Sabri KE (2020) A feature selection algorithm for intrusion detection system based on pigeon inspired optimizer. Expert Syst Appl 148:113249
Zhang Y, Gong D-W, Gao X-Z, Tian T, Sun X-Y (2019) Binary differential evolution with self-learning for multi-objective feature selection. Inf Sci 507:67–85
Kottath R, Poddar S, Sardana R, Bhondekar AP, Karar V (2020) Mutual information based feature selection for stereo visual odometry. J Intell Robot Syst 100:1559–1568
Chaghari A, Feiz-Derakhshi M-R, Balafar M-A (2018) Fuzzy clustering based on forest optimization algorithm. J King Saud Univ Comput Inf Sci 30(1):25–32
Raza MS, Qamar U (2017) Feature selection using rough set-based direct dependency calculation by avoiding the positive region. Int J Approx Reason 92:175–197
Gonzalez-Lopez J, Ventura S, Cano A (2019) Diatributed multi-label feature selection using individual mutual information measures. Knowl-Based Syst 188:105052
Alharbim AN, Dahab M (2020) An improvement in branch and bound algorithm for feature selection. Int J Inf Technol Lang Stud 4(1):1–11
Mnich K, Rudnicki WR (2020) All-relevant features selection using multidimensional filters with exhaustive search. Inf Sci 524:277–297
Ahmed N, Rafiq JI, Islam MR (2020) Enhanced human activity recognition based on smart phone sensor data using hybrid feature selection model. Sensors 20(1):317
Radman M, Chabakhsh A, Nariman-zadeh N, He H (2019) Generalized sequential forward selection method for channel selection in EEG signals for classification of left or right hand movement in BCI. In: 2019 9th International conference on computer and knowledge engineering (ICCKE), pp 137–142
Sun Z-X, Hu R, Qian B, Liu B, Che G-L (2018) Salp swarm algorithm based on blocks on critical path for reentrant job shop scheduling problems. In: International conference on intelligent computing, pp 638–648
Varghese NV, Singh A, Suresh A, Rahnamayan S (2020) Binary hybrid differential evolution algorithm for multi-label feature selection. In: 2020 IEEE international conference on systems, man, and cybernetics (SMC), pp 4386–4391
Mustafa S (2017) Feature selection using sequential backward method in melanoma recognition. In: 2017 13th International conference on electronics computer and computation (ICECCO), pp 1–4
Hu Q, Si X-S, Qin A-S, Lv Y-R, Zhang Q-H (2020) Machinery fault diagnosis scheme using redefined dimensionless indicators and mRMR feature selection. IEEE Access 8:40313–40326
Al-Tashi Q, Abdulkadir SJ, Rais HM, Mirjalili S, Alhussian H (2020) Approaches to multi-objective feature selection: a systematic literature review. IEEE Access 8:125076–125096
Khurma RA, Aljarah I, Sharieh A (2020) Rank based moth flame optimization for feature selection in the medical application. In: 2020 IEEE congress on evolutionary computation (CEC), pp 1–8
Qi Z, Wang H, He T, Li J, Gao H (2020) FRIEND: feature selection on inconsistent data. Neurocomputing 391:52–64
Qasim OS, Algamal ZY (2020) Feature selection using different transfer functions for binary bat algorithm. Int J Math Eng Manag Sci 5(4):697–706
Paniri M, Dowlatshahi MB, Nezamabadi-pour H (2019) MLACO: A multi-label feature selection algorithm based on ant colony optimization. Knowl-Based Syst 192:105285
Shaheen H, Agarwal S, Ranjan P (2019) MinMaxScaler binary PSO for feature selection. In: First international conference on sustainable technologies for computational intelligence, pp 705–716
Andrushia AD, Patricia AT (2019) Artificial bee colony optimization (ABC) for grape leaves disease detection. Evol Syst 11(1):105–117
Tahir M, Tubaishat A, Al-Obeidat F, Shah B, Halim Z, Waqas M (2020) A novel binary chaotic genetic algorithm for feature selection and its utility in affective computing and healthcare. Neural Comput Appl 1–22
Almasoudy FH, Al-Yaseen WL, Idrees AK (2019) Differential evolution wrapper feature selection for inrusion detection system. Procedia Comput Sci 167:1230–1239
Mirjalili S, Gandomi AH (2017) Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163–191
Kuyu YC, Vatansever F (2018) Real loss minimization in power systems via recent optimization techniques. In: 2018 2nd international symposium on multidisciplinary studies and innovative technologies (ISMSIT), pp 1–4
Ibrahim A, Ahmed A, Hussein S, Hassanien AE (2018) Fish image segmentation using salp swarm algorithm. In: International conference on advanced machine learning technologies and applications. Springer, pp 42–51
Singh N, Chiclana F, Magnot J-P (2020) A new fusion of salp swarm algorithm with sine cosine for optimization of non-linear functions. Eng Comput 36(1):185–212
Yang Z, Shi K, Wu A, Qiu M, Wei X (2019) A hybrid self-learning method based on particle swarm optimization and salp swarm algorithm. algorithm. In: 2019 Tenth International Conference on Intelligent Control and Information Processing (ICICIP), pp. 334-338. IEEE
Asaithambi S, Rajappa M (2018) Swarm intelligence-based approach for optimal design of CMOS differential amplifier and comparator circuit using a hybrid salp swarm algorithm. Rev Sci Instrum 89(5):054702
Hegazy AE, Makhlouf MA, El-Tawel GS (2020) Improved salp swarm algorithm for feature selection. J King Saud Univ Comput Inf Sci 32(3):335–344
Wang D, Zhou Y, Jiang S, Liu X (2018) A simplex method based salp swarm algorithm for numerical and engineering optimization. In: International conference on intelligent information processing, pp 150–159
Syed MA, Syed R (2019) Weighted salp swarm algorithm and its application towards optimal sensor deployment. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2019.07.005
Feng Y, Wang D, Yin Y, Li Z, Hu Z (2020) An XGBoost-based casualty prediction method for terrorist attacks. Complex Intell Syst 6(3):721–740
Taherkhani A, Cosma G, McGinnity TM (2020) AdaBoost-CNN: an adaptive boosting algorithm for convolutional neural networks to classify multi-class imbabalnced datasets using transfer learning. Neurocomputing 404:351–366
Mohammady M, Pourghasemi HR, Amiri M (2019) Land subsidence susceptibility assessment using random forest machine learning algorithm. Environ Earth Sci 78(16):503
Prabhat A, Khullar V (2017) Sentiment classification on big data using Naïve Bayes and logistic regression. Int Conf Comput Commun Inform (ICCCI) 2017:1–5
Guha R, Ghosh KK, Bhowmik SS, Sarkar R (2020) Mutually informed correlation coefficient (MICC)—a new filter based feature selection method. IEEE Calcutta Conf (CALON) 2020:54–58
Kushwaha P, Buckchash H, Raman B (2017) Anomaly based intrusion detection using filter based feature selection on KDD-CIP 99. In: TENCON 2017 IEEE region 10 Conf, pp 839–844
Chakraborty B, Kawamura A (2018) A new penalty-based wrapper fitness function for feature subset selection with evolutionary algorithms. J Inf Telecommun 2(2):163–180
Agrawal RK, Kaur B, Sharma S (2020) Quantum based whale optimization algorithm for wrapper feature selection. Appl Soft Comput 89:106092
Hammami M, Bechikh S, Hung C-C, Said LB (2018) A multi-objective hybrid filter-wrapper evolutionary approach for feature selection. Memet Comput 11(2):193–208
Hassonah MA, Al-Sayyed R, Rodan A, Al-Zoubi AM, Aljarah I, Faris H (2019) An efficient hybrid filter and evolutionary wrapper approach for sentiment analysis of various topics on Twitter. Knowl Based Syst 192: https://doi.org/10.1016/j.knosys.2019.105353
Moslehi F, Haeri A (2019) A novel hybrid wrapper-filter approach based on genetic algorithm, particle swarm optimization for feature subset selection. J Ambient Intell Humaniz Comput 11(3):1105–1127
Chormunge S, Jena S (2018) Correlation based feature selection with clustering for high dimensional data. J Electr Syst Inf Technol 5(3):542–549
Mohamed NS, Zainudin S, Othman ZA (2017) Metaheuristic approach for an enhanced mRMR filter method for classification using drug response microarray data. Expert Syst Appl 90:224–231
Song Q, Jiang H, Liu J (2017) Feature selection based on FDA and F-score for multi-class classification. Expert Syst Appl 81:22–27
Wosiak A, Zakrzewska D (2018) Integrating correlation-based feature selection and clustering for improved cardiovascular disease diagnosis. Complexity (2018)
Dhanya R, Paul IR, Akula SS, Sivakumar M, Nair JJ (2020) F-test feature selection in stacking ensemble model for breast cancer prediction. Procedia Comput Sci 171:1561–1570
Sayed S, Nassef M, Badr A, Farag I (2018) A nested genetic algorithm for feature selection in high-dimensional cancer microarray datasets. Expert Syst Appl 121:233–243
Thaseen IS, Kumar CA (2017) Intrusion detection model using fusion of chi-square feature selection and multi class SVM. J King Saud Univ Comput Inf Sci 29(4):462–472
Sharmin S, Ali AA, Khan MAH, Shoyaib M (2017) Feature selection and discretization based on mutual information. In: 2017 IEEE international conference on imaging, vision & pattern recognition (icIVPR), pp 1–6
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-depency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Bugata P, Drotar P (2019) On some aspects of minimum redundancy maximum relevance feature selection. Sci China Inf Sci 63(1):1–15
Alomari OA, Khader AT, Al-Betar MA, Awadallah MA (2018) A novel gene selection method using modified MRMR and hybrid bat-inspired algorithm with β-hill climbing. Appl Intell 48(11):4429–4447
Manikandan G, Susi E, Abirami S (2018) Flexible-fuzzy mutual information based feature selection on high dimensional data. Tenth Int Conf Adv Comput (ICoAC) 2018:237–243
Elhariri E, El-Bendary N, Taie SA (2020) Using hybrid filter-wrapper feature selection with multi-objective improved-salp optimization for crack severity recognition. IEEE Access 8:84290–84315
Mohammadi S, Desai V, Karimipour H (2018) Multivariate mutual information-based feature selection for cyber intrusion detection. In: 2018 IEEE electrical power and energy Conference (EPEC), pp 1–6.
Jo I, Lee S, Oh S (2019) Improved measure of redundancy and relevance for mRMR feature selection. Computers 8(2):42
Taghian S, Nadimi-Shahraki MH (2019) A binary metaheuristic algorithm for wrapper feature selection. Int J Comput Sci Eng (IJCSE) 8:168–172
Tubishat M, Ja’afar S, Alswaitti M, Mirjalili S, Idris N, Ismail MA, Omar MS (2020) Dynamic Salp swarm algorithm for feature selection. Expert Syst Appl 164:113873
Tubishat M, Idris N, Shuib L, Abushariah MAM, Mirjalili S (2019) Improved Salp Swarm Algorithm based on opposition based learning and novel local search algorithm for feature selection. Expert Syst Appl 145:113122
Jiang Y, Liu X, Yan G, Xiao J (2017) Modified binary cukoo search for feature selection: a hybrid filter-wrapper approach. In: 2017 13th international conferrence on computational intelligence and security (CIS), pp 488–491.
Jain I, Jain VK, Jain R (2017) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215
Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79
Jia X, Rao Y, Shang L, Li T (2019) Similarity-based attribute reduction in rough set theory: a clustering perspective. Int J Mach Learn Cybern 11:1047–1060. https://doi.org/10.1007/s13042-019-00959-w
Manoj RJ, Praveena MA, Vijayakumar K (2018) An ACO-ANN based features selection algorithm for big data. Clust Comput 22(2):3953–3960
Mafarja M, Eleyan D, Abdullah S, Mirjalili S (2017) S-Shaped vs. V-shaped transfer functions for ant lion optimization algorithm in feature selection problem. In: Proceedings of the international conference on future networks and distributed systems. ACM, pp 1–7
Emary E, Zawbaa HM (2018) Feature selection via Levy Antlion optimization. Pattern Anal Appl 22(3):857–876
Mafarja M, Mirjalili SI (2017) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453
Hussien AG, Hassanien AE, Houssein EH, Bhattacharyya S, Amin M (2019) S-shaped Binary Whale Optimization Algorithm for Feature Selection. In: Bhattacharyya S, Mukherjee A, Bhaumik H, Das S, Yoshida K (eds) Recent Trends in Signal and Image Processing. Advances in Intelligent Systems and Computing, vol 727. Springer, Singapore. https://doi.org/10.1007/978-981-10-8863-6_9
Das AK, Das S, Ghosh A (2017) Ensemble feature selection using bi-objective genetic algorithm. Knowl-Based Syst 123:116–127
Abualigah LM, Khader AT, Hanandeh ES (2017) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 25:456–466
Zhang X, Mei C, Chen D, Yang Y (2018) A fuzzy rough set-based feature selection method using representative instances. Knowl-Based Syst 151:216–229
Chen H, Li T, X. fan, C. Luo, (2019) Feature selection for imbalanced data based on neighborhood rough sets. Inf Sci 483:1–20
Hasani H, Jalali SMJ, Rezaei D, Maleki M (2018) A data mining framework for classification of organisational performance based on rough set theory. Asian J Manag Sci Appl 3(2):156–180
Alia AF, Taweel A (2017) Feature selection based on hybrid cuckoo search and rough set theory in classification for nominal datasets. Algorithms 14(21):65
Al-Radaideh QA, Al-Qudah GY (2017) Application of rough set-based feature selection for Arabic sentiment analysis. Cogn Comput 9(4):436–445
Faris H, Mafarja MM, Heidari AA, Aljarah I, Al-Zoubi AM, Mirjalili S, Fujita H (2018) An efficient binary salp swarm algorithm with crossover scheme for feature selection problems. Knowl-Based Syst 154:43–67
Zhang J, Wang Z, Luo X (2018) Parameter estimation for soil water retention curve using the salp swarm algorithm. Water 10(6):815
Ibrahim HT, Mazher WJ, Ucan ON, Bayat O (2017) Feature selection using salp swarm algorithm for real biomedical datasets. IJCSNS Int J Comput Sci Netw Secur 17(12):13–20
Hegazy AhE, Makhlouf MA, El-Tawel GhS (2018) Feature selection using chaotic salp swarm algorithm for data classification. Arab J Sci Eng 44:3801–3816
Hegazy AhE, Makhlouf MA, El-Tawel GhS (2018) Improved salp swarm algorithm for feature selection. J King Saud Univ Comput Inf Sci 32(3):335–344
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Dong Q, Zhu X, Gong S (2019) Single-label multi-class image classification by deep logistic regression. Proc AAI Conf Artif Intell 33:3486–3493
Kaggle datsets. https://www.kaggle.com/datasets