Local support-based partition algorithm for frequent pattern mining
Tóm tắt
Frequent pattern (itemset) mining is one of the established approaches for knowledge discovery. Minimizing the number of database scans (I/O overhead) is a challenging task in Frequent itemset mining. Partition algorithm is one of the early novel approaches to reduce the database I/O overhead as compared to Apriori algorithm and other related methods. However, Partition algorithm suffers from a significant database I/O overhead (that is, it reads the database twice from the secondary storage) and higher time complexity for computation of frequent itemsets in large databases. In this work, an improved partition algorithm is proposed, which reads the database only once and makes use of local support information to avoid further scans of the database. The proposed algorithm outperforms Apriori and Partition algorithms and shows closer performance to FP-Growth algorithm, in terms of computational time. The proposed method outpaces FP-Growth algorithm in terms of memory usage and is competitive to other algorithms. In terms of database access time, the proposed method exhibits better performance over FP-Growth, Partition and Apriori methods.
Tài liệu tham khảo
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86
Zaki MJ (1999) Parallel and distributed association mining: a survey. IEEE Concur 7(4):14–25
Boukerche A, Samarah S (2008) A novel algorithm for mining association rules in wireless ad hoc sensor networks. IEEE Trans Parallel Distrib Syst 19(7):865–877
Gu X, Zhu Y, Zhou S, Wang C, Qiu M, Wang G (2016) A real-time fpga-based accelerator for ecg analysis and diagnosis using association-rule mining. ACM Trans Embed Comput Syst 15(2):25:1–25:23
Vathsala H, Koolagudi SG (2017) Prediction model for peninsular indian summer monsoon rainfall using data mining and statistical approaches. Comput Geosci 98:55–63
Weng C-H (2016) Discovering highly expected utility itemsets for revenue prediction. Knowl Based Syst 104:39–51
Qu Z, Keeney J, Robitzsch S, Zaman F, Wang X (2016) Multilevel pattern mining architecture for automatic network monitoring in heterogeneous wireless communication networks. China Commun 13(7):108–116
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, VLDB, Santiago, Chile, pp 487–499
Pujari AK (2013) Data mining techniques, 2nd edn. Universities Press (India) Pvt. Ltd., Hyderabad
Brin S, Motwani R, Ullman J, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. ISIGMOD Record 6(2):255–264
Lin DI, Kedem ZM (2002) Pincer-search: an efficient algorithm for discovering the maximum frequent set. IEEE Trans Knowl Data Eng 14(3):553–566
Savasere A, Omiecinski E, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21th international conference on very large data bases, ser. VLDB’95, pp 432–444
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, ser. SIGMOD’00. ACM, New York, NY, pp 1–12
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390
Belbachir K, Belbachir H (2012) The parallelization of algorithm based on partition principle for association rules discovery. In 2012 international conference on multimedia computing and systems, May 2012, pp 1–6
Amphawan K, Lenca P, Surarerks A (2012) Mining top-k regular-frequent itemsets using database partitioning and support estimation. Expert Syst Appl 39(2):1924–1936
Liu C, Liu Y (2008) Two phases association rule mining of remote sensing images based partition and bitset. In: 2008 IEEE international symposium on knowledge acquisition and modeling workshop, Dec 2008, pp 161–164
Subramanyam RBV, Suvvari SR (2012) Partition-based approach for fast mining of transitional patterns. In: 2012 UKSim 14th international conference on computer modelling and simulation, Mar 2012, pp 151–155
Lee C-H, Chen M-S, Lin C-R (2003) Progressive partition miner: an efficient algorithm for mining general temporal association rules. IEEE Trans Knowl Data Eng 15(4):1004–1017
Li Z, Liu J, Tang J, Lu H (2015) Robust structured subspace learning for data representation. IEEE Trans Pattern Anal Mach Intell 37(10):2085–2098
Li Z, Liu J, Yang Y, Zhou X, Lu H (2014) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Knowl Data Eng 26(9):2138–2150
FIM-Repository. Frequent itemset mining dataset repository. http://fimi.ua.ac.be/data/, Visited in Dec 2017
pyfpgrowth 1.0 A python implementation of the frequent pattern growth algorithm. https://pypi.python.org/pypi/pyfpgrowth. Visited in Dec 2017
Koshy T (2004) Discrete mathematics with applications. Academic Press, Burlington