A fuzzy similarity-based rough set approach for attribute selection in set-valued information systems

Soft Computing - Tập 24 - Trang 4675-4691 - 2019
Shivani Singh1, Shivam Shreevastava2, Tanmoy Som2, Gaurav Somani2
1DST-Centre for Interdisciplinary Mathematical Sciences, Institute of Science, BHU, Varanasi, India
2Department of Mathematical Sciences, IIT (BHU), Varanasi, India

Tóm tắt

Databases obtained from different search engines, market data, patients’ symptoms and behaviours, etc., are some common examples of set-valued data, in which a set of values are correlated with a single entity. In real-world data deluge, various irrelevant attributes lower the ability of experts both in speed and in predictive accuracy due to high dimension and insignificant information, respectively. Attribute selection is the concept of selecting those attributes that ideally are necessary as well as sufficient to better describe the target knowledge. Rough set-based approaches can handle uncertainty available in the real-valued information systems after the discretization process. In this paper, we introduce a novel approach for attribute selection in set-valued information system based on tolerance rough set theory. The fuzzy tolerance relation between two objects using a similarity threshold is defined. We find reducts based on the degree of dependency method for selecting best subsets of attributes in order to obtain higher knowledge from the information system. Analogous results of rough set theory are established in case of the proposed method for validation. Moreover, we present a greedy algorithm along with some illustrative examples to clearly demonstrate our approach without checking for each pair of attributes in set-valued decision systems. Examples for calculating reduct of an incomplete information system are also given by using the proposed approach. Comparisons are performed between the proposed approach and fuzzy rough-assisted attribute selection on a real benchmark dataset as well as with three existing approaches for attribute selection on six real benchmark datasets to show the supremacy of proposed work.

Tài liệu tham khảo

Blake CL (1998) UCI Repository of machine learning databases, Irvine, University of California. http://www.ics.uci.edu/~mlearn/MLRepository.html. Accessed 1 Feb 2019 Dai J (2013) Rough set approach to incomplete numerical data. Inf Sci 241:43–57 Dai J, Tian H (2013) Fuzzy rough set model for set-valued data. Fuzzy Sets Syst 229:54–68 Dai J, Xu Q (2012) Approximations and uncertainty measures in incomplete information systems. Inf Sci 198:62–80 Dai J, Wang W, Tian H, Liu L (2013) Attribute selection based on a new conditional entropy for incomplete decision systems. Knowl-Based Syst 39:207–213 Dubois D, Prade H (1992) Putting rough sets and fuzzy sets together. In: Słowiński R (ed) Intelligent decision support. Springer, Dordrecht, pp 203–232 Guan YY, Wang HK (2006) Set-valued information systems. Inf Sci 176(17):2507–2525 Hall M (1999) Correlation-based feature selection for machine learning. PhD Thesis, Department of Computer Science, Waikato University, New Zealand Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18 He Y, Naughton JF (2009) Anonymization of set-valued data via top-down, local generalization. Proc VLDB Endow 2(1):934–945 Hu Q, Yu D, Liu J, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594 Huang SY (ed) (1992) Intelligent decision support: handbook of applications and advances of the rough sets theory, vol 11. Springer, Berlin Jensen R, Shen Q (2009) New approaches to fuzzy-rough feature selection. IEEE Trans Fuzzy Syst 17(4):824–838 Jensen R, Cornelis C, Shen Q. (2009) Hybrid fuzzy-rough rule induction and feature selection. In: FUZZ-IEEE 2009, IEEE international conference on fuzzy systems, 2009. IEEE, pp. 1151–1156 Kryszkiewicz M (1998) Rough set approach to incomplete information systems. Inf Sci 112(1–4):39–49 Kryszkiewicz M (1999) Rules in incomplete information systems. Inf Sci 113(3–4):271–292 Lang G, Li Q, Yang T (2014) An incremental approach to attribute reduction of dynamic set-valued information systems. Int J Mach Learn Cybern 5(5):775–788 Leung Y, Li D (2003) Maximal consistent block technique for rule acquisition in incomplete information systems. Inf Sci 153:85–106 Lipski W Jr (1979) On semantic issues connected with incomplete information databases. ACM Trans Database Syst (TODS) 4(3):262–296 Lipski W Jr (1981) On databases with incomplete information. J ACM (JACM) 28(1):41–70 Luo C, Li T, Chen H, Liu D (2013) Incremental approaches for updating approximations in set-valued ordered information systems. Knowl-Based Syst 50:218–233 Luo C, Li T, Chen H (2014) Dynamic maintenance of approximations in set-valued ordered decision systems under the attribute generalization. Inf Sci 257:210–228 Luo C, Li T, Chen H, Lu L (2015) Fast algorithms for computing rough approximations in set-valued decision systems while updating criteria values. Inf Sci 299:221–242 Orłowska E (1985) Logic of nondeterministic information. Stud Logica 44(1):91–100 Orłowska E, Pawlak Z (1984) Representation of nondeterministic information. Theor Comput Sci 29(1–2):27–39 Pawlak Z (1991) Rough Sets: theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht Pawlak Z, Skowron A (2007a) Rough sets and Boolean reasoning. Inf Sci 177(1):41–73 Pawlak Z, Skowron A (2007b) Rough sets: some extensions. Inf Sci 177(1):28–40 Pawlak Z, Skowron A (2007c) Rudiments of rough sets. Inf Sci 177(1):3–27 Qian Y, Dang C, Liang J, Tang D (2009) Set-valued ordered information systems. Inf Sci 179(16):2809–2832 Qian Y, Liang J, Pedrycz W, Dang C (2010a) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174(9–10):597–618 Qian YH, Liang JY, Song P, Dang CY (2010b) On dominance relations in disjunctive set-valued ordered information systems. Int J Inf Technol Decis Mak 9(01):9–33 Qian J, Miao DQ, Zhang ZH, Li W (2011) Hybrid approaches to attribute reduction based on indiscernibility and discernibility relation. Int J Approx Reason 52(2):212–230 Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1–2):23–69 Shi Y, Yao L, Xu J (2011) A probability maximization model based on rough approximation and its application to the inventory problem. Int J Approx Reason 52(2):261–280 Shoemaker CA, Ruiz C (2003) Association rule mining algorithms for set-valued data. In: International conference on intelligent data engineering and automated learning, Springer, Berlin, pp. 669–676 Shu W, Qian W (2014) Mutual information-based feature selection from set-valued data. In: 26th IEEE international conference on tools with artificial intelligence (ICTAI), 2014, IEEE, pp. 733–739 Wang H, Yue HB, Chen XE (2013) Attribute reduction in interval and set-valued decision information systems. Appl. Math. 4(11):1512 Data sets in articles. http://www.yuhuaqian.com Yang T, Li Q (2010) Reduction about approximation spaces of covering generalized rough sets. Int J Approx Reason 51(3):335–345 Yang QS, Wang GY, Zhang QH, MA XA (2010) Disjunctive set-valued ordered information systems based on variable precision dominance relation. J. Guangxi Normal Univ Nat Sci Ed 3:84–88 Yang X, Zhang M, Dou H, Yang J (2011) Neighborhood systems-based rough sets in incomplete information system. Knowl Based Syst 24(6):858–867 Yang X, Song X, Chen Z, Yang J (2012) On multigranulation rough sets in incomplete information system. Int J Mach Learn Cybern 3(3):223–232 Yao YY (2001) Information granulation and rough set approximation. Int J Intell Syst 16(1):87–104 Yao YY, Liu Q (1999) A generalized decision logic in interval-set-valued information tables. In: International workshop on rough sets, fuzzy sets, data mining, and granular-soft computing, Springer, Berlin, pp. 285–293 Zadeh LA (1996) Fuzzy sets. In: Fuzzy sets, fuzzy logic, and fuzzy systems: selected papers by Lotfi A Zadeh, pp. 394–432 Zhang J, Li T, Ruan D, Liu D (2012) Rough sets based matrix approaches with dynamic attribute variation in set-valued information systems. Int J Approx Reason 53(4):620–635