IMSR_PreTree: an improved algorithm for mining sequential rules based on the prefix-tree
Tóm tắt
Sequential rules generated from sequential patterns express temporal relationships among patterns. Sequential rule mining is an important research problem because it has broad application such as the analyses of customer purchases, web log, DNA sequences, and so on. However, developing an efficient algorithm for mining sequential rules is a difficult problem due to the large size of the sequential pattern set. The larger the sequential pattern set, the longer the mining time. In this paper, we propose a new algorithm called IMSR_PreTree which is an improved algorithm of MSR_PreTree that mines sequential rules based on prefix-tree. IMSR_PreTree also generates rules from frequent sequences stored in a prefix-tree but it prunes the sub trees which give non-significant rules very early in the process of rule generation and avoids tree scanning as much as possible. Thus, IMSR_PreTree can significantly reduce the search space during the mining process. Our performance study shows that IMSR_PreTree outperforms MSR_PreTree, especially on large sequence databases.
Tài liệu tham khảo
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th international conference very large data, bases, pp. 487–499 (1994)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the 11th international conference on data engineering, pp. 3–14. IEEE (1995)
Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 429–435. ACM (2002)
Baralis, E., Chiusano, S., Dutto, R.: Applying sequential rules to protein localization prediction. Comput. Math. Appl. 55(5), 867–878 (2008)
Vo, B., Hong, T.P., Le, B.: A lattice-based approach for mining most generalization association rules. Knowl. Based Syst. 45, 20–30 (2013)
El-Sayed, M., Ruiz, C., Rundensteiner, E.A.: FS-Miner: efficient and incremental mining of frequent sequence patterns in web logs. In: Proceedings of the 6th annual ACM international workshop on web information and data management, pp. 128–135 (2004)
Ezeife, C.I., Lu, Y., Liu, Y.: PLWAP sequential mining: open source code. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, pp. 26–35 (2005)
Fournier-Viger, P., Faghihi, U., Nkambou, R., Nguifo, E.M.: CMRules: mining sequential rules common to several sequences. Knowl. Based Syst. 25(1), 63–76 (2012)
Fournier-Viger, P., Nkambou, R., Tseng, V.S.M.: RuleGrowth: mining sequential rules common to several sequences by pattern-growth. In: Proceedings of the 2011 ACM symposium on applied computing, pp. 956–961 (2011)
Fournier-Viger, P., Wu, C.W., Tseng, V.S., Nkambou, R.: Mining sequential rules common to several sequences with the window size constraint. Adv. Artif. Intell. 299–304 (2012)
Gouda, K., Hassaan, M., Zaki, M.J.: Prism: an effective approach for frequent sequence mining via prime-block encoding. J. Comput. Syst. Sci. 76(1), 88–102 (2010)
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.C. FreeSpan: frequent pattern-projected sequential pattern mining. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 355–359 (2000)
Lo, D., Khoo, S.-C., Liu, C.: Efficient mining of recurrent rules from a sequence database. In: DASFAA 2008, LNCS vol. 4947, pp. 67–83 (2008)
Lo, D., Khoo, S.C., Wong, L.: Non-redundant sequential rules—theory and algorithm. Inf. Syst. 34(4), 438–453 (2009)
Masseglia, F., Cathala, F., Poncelet, P.: The PSP approach for mining sequential patterns. In: PKDD’98, Nantes, France, LNCS vol. 1510, pp. 176–184 (1998)
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Hsu, M.C.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004)
Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining access patterns efficiently from web logs. In: Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’00), Kyoto, Japan, pp. 396–407 (2000)
Spiliopoulou, M.: Managing interesting rules in sequence mining. In: Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery, Prague, Czech Republic, pp. 554–560 (1999)
Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology, Avignon, France, LNCS, pp. 3–17 (1996)
UCI Machine Learning Repository. http://www.ics.uci.edu/mlearn/MLRepository.html
Van, T.T., Vo, B., Le, B.: Mining sequential rules based on prefix-tree. In: Proceedings of the 3rd Asian Conference on Intelligent Information and Database Systems, Daegu, Korea, pp. 147–156 (2011)
Zaki, M.J.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1–2), 31–60 (2001)
