Discernible neighborhood counting based incremental feature selection for heterogeneous data

Yanyan Yang1, Shiji Song1, Degang Chen2, Xiao Zhang3
1Department of Automation, Tsinghua University, Beijing, People's Republic of China
2Department of Mathematics and Physics, North China Electric Power University, Beijing, People’s Republic of China
3Department of Applied Mathematics, School of Sciences, Xi’an University of Technology, Xi’an, People’s Republic of China

Tóm tắt

Incremental feature selection refreshes a subset of information-rich features from added-in samples without forgetting the previously learned knowledge. However, most existing algorithms for incremental feature selection have no explicit mechanisms to handle heterogeneous data with symbolic and real-valued features. Therefore, this paper presents an incremental feature selection method for heterogeneous data with the sequential arrival of samples in group. Discernible neighborhood counting that measures different types of features, is first introduced to establish a framework for feature selection from heterogeneous data. With the arrival of new samples, the discernible neighborhood counting of a feature subset is then updated to reveal the incremental feature selection scheme. This scheme determines the criterion for efficiently adding informative features and deleting redundant features. Based on the incremental scheme, our incremental feature selection algorithm is further formulated to select valuable features from heterogeneous data. Extensive experiments are finally conducted to demonstrate the effectiveness and the efficiency of the proposed incremental feature selection algorithm.

Tài liệu tham khảo

Qian YH, Liang JY, Pedrycz W, Dang CY (2010) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174(9):597–618 Zhang X, Mei CL, Chen DG, Yang YY (2018) A fuzzy rough set-based feature selection method using representative instances. Knowl Based Syst 151:216–229 Bell DA, Wang H (2000) A formalism for relevance and its application in feature subset selection. Mach Learn 41(2):175–195 Zeng AP, Li TR, Liu D, Zhang JB, Chen HM (2015) A fuzzy rough set approach for incremental feature selection on hybrid information systems. Fuzzy Sets Syst 258:39–60 Zhang JB, Zhu Y, Pan Y, Li TR (2016) Efficient parallel boolean matrix based algorithms for computing composite rough set approximations. Inf Sci 329:287–302 Zhang JB, Li TR, Chen HM (2014) Composite rough sets for dynamic data mining. Inf Sci 257:81–100 Tang WY, Mao KZ (2007) Feature selection algorithm for mixed data with both nominal and continuous features. Pattern Recogn Lett 28(5):563–571 Ching JY, Wong AKC, Chan KCC (1995) Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Trans Pattern Anal Mach Intell 17(7):641–651 Chmielewski MR, Grzymala-Busse JW (1996) Global discretization of continuous attributes as preprocessing for machine learning. Int J Approx Reason 15(4):319–331 Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106 Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594 Chen DG, Yang YY (2014) Attribute reduction for heterogeneous data based on the combination of classical and fuzzy rough set models. IEEE Trans Fuzzy Syst 22(5):1325–1334 Zhang X, Mei CL, Chen DG, Li JH (2016) Feature selection in mixed data: a method using a novel fuzzy rough set-based information entropy. Pattern Recogn 56:1–15 Wang CZ, Hu QH, Wang XZ, Chen DG, Qian YH, Dong Z (2018) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst 29(7):2986–2999 Wang CZ, He Q, Shao MW, Hu QH (2018) Feature selection based on maximal neighborhood discernibility. Int J Mach Learn Cybern 9(11):1929–1940 Wu Y, Hoi SCH, Mei T, Yu NH (2017) Large-scale online feature selection for ultra-high dimensional sparse data. ACM Trans Knowl Discov Data 11(4):1–13 Luo C, Li TR, Chen HM, Fujita H, Zhang Y (2018) Incremental rough set approach for hierarchical multicriteria classification. Inf Sci 429:72–87 Das AK, Sengupta S, Bhattacharyya S (2018) A group incremental feature selection for classification using rough set theory based genetic algorithm. Appl Soft Comput 65:400–411 Xie XJ, Qin XL (2018) A novel incremental attribute reduction approach for dynamic incomplete decision systems. Int J Approx Reason 93:443–462 Huang YY, Li TR, Luo C, Fujita H, Horng SJ (2017) Matrix-based dynamic updating rough fuzzy approximations for data mining. Knowl Based Syst 119(C):273–283 Hu CX, Liu SX, Liu GX (2017) Matrix-based approaches for dynamic updating approximations in multigranulation rough sets. Knowl Based Syst 122:51–63 Zhang YY, Li TR, Luo C, Zhang JB, Chen HM (2016) Incremental updating of rough approximations in interval-valued information systems under attribute generalization. Inf Sci 373:461–475 Hu J, Li TR, Luo C, Fujita H, Li SY (2016) Incremental fuzzy probabilistic rough sets over two universes. Int J Approx Reason 81:28–48 Luo C, Li TR, Chen HM, Fujita H, Zhang Y (2016) Efficient updating of probabilistic approximations with incremental objects. Knowl Based Syst 109:71–83 Chen HM, Li TR, Luo C, Horng SJ, Wang GY (2014) A rough set-based method for updating decision rules on attribute values’ coarsening and refining. IEEE Trans Knowl Data Eng 6(12):2886–2899 Chen HM, Li TR, Luo C, Horng SJ, Wang GY (2015) A decision-theoretic rough set approach for dynamic data mining. IEEE Trans Fuzzy Syst 23(6):1958–1970 Orlowska ME, Orlowski MW (1992) Maintenance of knowledge in dynamic information systems. Springer, Dordrecht, pp 315–329 Hu F, Wang GY, Huang H, Wu Y (2005) Incremental attribute reduction based on elementary sets. In: Slezak D, Wang G, Szczuka M, Duntsch I, Yao Y (eds) International conference on rough sets, fuzzy sets, data mining, and granular computing. Springer, Berlin, Heidelberg, pp 185–193 Hu F, Dai J, Wang GY (2007) Incremental algorithms for attribute reduction in decision table. Control Decis 22(3):268–272 Yang M (2007) An incremental updating algorithm for attribute reduction based on improved discernibility matrix. Chin J Comput 30(5):815–822 Feng SR, Zhang DZ (2012) Increment algorithm for attribute reduction based on improvement of discernibility matrix. J Shenzhen Univ Sci Eng 29:5 Shu WH, Shen H (2013) A rough-set based incremental approach for updating attribute reduction under dynamic incomplete decision systems. In: IEEE international conference on fuzzy systems. IEEE, Hyderabad, pp 1–7 Liang JY, Wang F, Dang CY, Qian YH (2013) A group incremental approach to feature selection applying rough set technique. IEEE Trans Knowl Data Eng 26(2):294–308 Chen DG, Yang YY, Dong Z (2016) An incremental algorithm for attribute reduction with variable precision rough sets. Appl Soft Comput 45:129–149 Yang YY, Chen DG, Wang H, Tsang ECC, Zhang DL (2016) Fuzzy rough set based incremental attribute reduction from dynamic data with sample arriving. Fuzzy Sets Syst 312:66–86 Yang YY, Chen DG, Wang H, Wang XZ (2018) Incremental perspective for feature selection based on fuzzy rough sets. IEEE Trans Fuzzy Syst 26(3):1257–1273 Yang YY, Chen DG, Wang H (2017) Active sample selection based incremental algorithm for attribute reduction with rough sets. IEEE Trans Fuzzy Syst 25(4):825–838 Lang GM, Li QG, Cai MJ, Yang T (2015) Characteristic matrixes-based knowledge reduction in dynamic covering decision information systems. Knowl Based Syst 85(C):1–26 Jing YG, Li TR, Luo C, Horng SJ, Wang GY, Yu Z (2016) An incremental approach for attribute reduction based on knowledge granularity. Knowl Based Syst 104(C):24–38 Jing YG, Li TR, Fujita H, Yu Z, Wang B (2017) An incremental attribute reduction approach based on knowledge granularity with a multi-granulation view. Inf Sci 411:23–38 Wang H (2006) Nearest neighbors by neighborhood counting. IEEE Trans Pattern Anal Mach Intell 28(6):942–953 Wu WZ, Zhang WX (2002) Neighborhood operator systems and approximations. Inf Sci 144(1):201–217 Zhu PF, Hu QH (2013) Adaptive neighborhood granularity selection and combination based on margin distribution optimization. Inf Sci 249:1–12 Wang CZ, Shi YP, Fan XD, Shao MW (2019) Attribute reduction based on k-nearest neighborhood rough sets. Int J Approx Reason 106:18–31 Wang CZ, Shao MW, He Q, Qian YH, Qi YL (2016) Feature subset selection based on fuzzy neighborhood rough sets. Knowl Based Syst 111:173–179