A multi-objective feature selection method based on bacterial foraging optimization

Springer Science and Business Media LLC - Tập 20 - Trang 63-76 - 2019
Ben Niu1, Wenjie Yi1, Lijing Tan1, Shuang Geng1, Hong Wang1
1College of Management, Shenzhen University, Shenzhen, China

Tóm tắt

Feature selection plays an important role in data preprocessing. The aim of feature selection is to recognize and remove redundant or irrelevant features. The key issue is to use as few features as possible to achieve the lowest classification error rate. This paper formulates feature selection as a multi-objective problem. In order to address feature selection problem, this paper uses the multi-objective bacterial foraging optimization algorithm to select the feature subsets and k-nearest neighbor algorithm as the evaluation algorithm. The wheel roulette mechanism is further introduced to remove duplicated features. Four information exchange mechanisms are integrated into the bacteria-inspired algorithm to avoid the individuals getting trapped into the local optima so as to achieve better results in solving high-dimensional feature selection problem. On six small datasets and ten high-dimensional datasets, comparative experiments with different conventional wrapper methods and several evolutionary algorithms demonstrate the superiority of the proposed bacteria-inspired based feature selection method.

Tài liệu tham khảo

Bennasar M, Hicks Y, Setchi R (2015) Feature selection using joint mutual information maximisation. Expert Syst Appl 42(22):8520–8532 Caruana R, Freitag D (1994) Greedy attribute selection. In: Machine learning proceedings, pp 28–36 Chen ZJ, Wu CZ, Zhang YS, Huang Z, Ran B, Zhong M et al (2015) Feature selection with redundancy-complementariness dispersion. Knowl Based Syst 89:203–217 Chen YP, Li Y, Wang G et al (2017) A novel bacterial foraging optimization algorithm for feature selection. Expert Syst Appl 83:1–17 Chiang LH, Pell RJ (2004) Genetic algorithms combined with discriminant analysis for key variable identification. J Process Control 14(2):143–155 Choi E, Lee C (2003) Feature extraction based on the Bhattacharyya distance. Pattern Recognit 36(8):1703–1709 Chuang LY, Chang HW, Tu CJ, Yang CH (2008) Improved binary PSO for feature selection using gene expression data. Comput Biol Chem 32(1):29–38 Chuang LY, Tsai SW, Yang CH (2011) Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst Appl 38(10):12699–12707 Dai Q, Yao C (2017) A hierarchical and parallel branch-and-bound ensemble selection algorithm. Appl Intell 46:1–17 Dash M, Liu H, Motoda H (2000) Consistency based feature selection. Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, pp 98–109 Deb K, Pratap A, Agarwal S et al (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197 Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern Part B Cybern 26(1):29–41 Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings of the sixth international symposium on micro machine and human science. IEEE, pp 39–43 Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml Gutlein M, Frank E, Hall M, Karwath A (2009) Large-scale attribute selection using wrappers. In: Proceeding. IEEE symposium on computational intelligence and data mining, pp 332–339 Hamdani TM, Won JM, Alimi AM, Karray F (2007) Multi-objective feature selection with NSGA II. Int Conf Adapt Natural Comput Algorithms 4431:240–247 Hsu WH (2004) Genetic wrappers for feature selection in decision tree induction and variable ordering in bayesian network structure learning. Inf Sci 163(17):103–122 Jia JH, Yang N, Zhang C, Yue AZ, Yang JY, Zhu DH (2013) Object-oriented feature selection of high spatial resolution images using an improved relief algorithm. Math Comput Model 58(3–4):619–626 Jin X, Ma EWM, Cheng LL, Pecht M (2012) Health monitoring of cooling fans based on mahalanobis distance with mrmr feature selection. IEEE Trans Instrum Meas 61(8):2222–2229 Jović A, Bogunović N (2015) A review of feature selection methods with applications. In: International convention on information communication technology, electronics and microelectronics. IEEE Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Erciyes University, Kayseri Kashef S, Nezamabadi-Pour H (2015) An advanced ACO algorithm for feature selection. Neurocomputing 147:271–279 Kennedy J, Eberhard R (1997) A discrete binary version of the particle swarm algorithm. Proc IEEE Int Conf Syst Man Cybern Comput Cybern Simul 5:4104–4108 Khushaba RN, Al-Ani A, Al-Jumaily A (2011) Feature subset selection using differential evolution and a statistical repair mechanism. Expert Syst Appl 38(9):11515–11526 Lin SW, Lee ZJ, Chen SC, Tseng TY (2008a) Parameter determination of support vector machine and feature selection using simulated annealing approach. Appl Soft Comput 8(4):1505–1512 Lin SW, Ying KC, Chen SC, Lee ZJ (2008b) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35(4):1817–1824 McNabb A, Gardner M, Seppi K (2009) An exploration of topologies and communicational in large particle swarms. In: Proceedings of the IEEE congress on evolutionary computation IEEE Press, pp 712–719 Niu B, Wang H, Wang J, Tan LJ (2013) Multi-objective bacterial foraging optimization. Neurocomputing 116:336–345 Ozturk O, Aksac A, Elsheikh A, Ozyer T, Alhajj R (2013) A consistency-based feature selection method allied with linear SVMs for HIV-1 protease cleavage site prediction. PLoS ONE 8(8):e63145 Park CH, Kim SB (2015) Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Syst Appl 42(5):2336–2342 Passino KM (2002) Biomimicry of bacterial foraging for distributed optimization and control. IEEE Control Syst 22(3):52–67 Peng HC, Long FH, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238 Wang H, Niu B (2017) A novel bacterial algorithm with randomness control for feature selection in classification. Elsevier, Amsterdam Wang HS, Yan XF (2015) Optimizing the echo state network with a binary particle swarm optimization algorithm. Knowl Based Syst 86:182–193 Wang G, Ma J, Yang SL (2011) IGF-bagging: information gain based feature selection for bagging. Int J Innov Comput Inf Control 7(11):6247–6259 Wang H, Jing X, Niu B (2017) A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data. Elsevier, Amsterdam Xue B, Zhang M, Browne WN (2012) New fitness functions in binary particle swarm optimisation for feature selection. In: Evolutionary computation (CEC). 2012 IEEE Congress Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671 Xue B, Zhang M, Browne WN (2014) Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms. Appl Soft Comput 18:261–276 Xue B, Zhang M, Browne W, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626 Yang CH, Chuang LY, Yang CH (2010) IG-GA: a hybrid filter/wrapper method for feature selection of microarray data. J Med Biol Eng 30(1):23–28 Zhao Z, Liu H (2009) Searching for interacting features in subset selection. IOS Press 13(2):207–228 Zhu Z, Ong YS, Markov DM (2007) Blanket-embedded genetic algorithm for gene selection. Pattern Recognit 40(11):3236–3248