IMSR_PreTree: an improved algorithm for mining sequential rules based on the prefix-tree

Springer Science and Business Media LLC - Tập 1 - Trang 97-105 - 2014

Thien-Trang Van¹, Bay Vo², Bac Le³

¹Faculty of Information Technology, Ho Chi Minh City University of Technology, Ho Chi Minh, Vietnam

²Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh, Vietnam

³Faculty of Information Technology, University of Science, VNU, Ho Chi Minh, Vietnam

Tóm tắt

Sequential rules generated from sequential patterns express temporal relationships among patterns. Sequential rule mining is an important research problem because it has broad application such as the analyses of customer purchases, web log, DNA sequences, and so on. However, developing an efficient algorithm for mining sequential rules is a difficult problem due to the large size of the sequential pattern set. The larger the sequential pattern set, the longer the mining time. In this paper, we propose a new algorithm called IMSR_PreTree which is an improved algorithm of MSR_PreTree that mines sequential rules based on prefix-tree. IMSR_PreTree also generates rules from frequent sequences stored in a prefix-tree but it prunes the sub trees which give non-significant rules very early in the process of rule generation and avoids tree scanning as much as possible. Thus, IMSR_PreTree can significantly reduce the search space during the mining process. Our performance study shows that IMSR_PreTree outperforms MSR_PreTree, especially on large sequence databases.

Tài liệu tham khảo

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th international conference very large data, bases, pp. 487–499 (1994) Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the 11th international conference on data engineering, pp. 3–14. IEEE (1995) Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 429–435. ACM (2002) Baralis, E., Chiusano, S., Dutto, R.: Applying sequential rules to protein localization prediction. Comput. Math. Appl. 55(5), 867–878 (2008) Vo, B., Hong, T.P., Le, B.: A lattice-based approach for mining most generalization association rules. Knowl. Based Syst. 45, 20–30 (2013) El-Sayed, M., Ruiz, C., Rundensteiner, E.A.: FS-Miner: efficient and incremental mining of frequent sequence patterns in web logs. In: Proceedings of the 6th annual ACM international workshop on web information and data management, pp. 128–135 (2004) Ezeife, C.I., Lu, Y., Liu, Y.: PLWAP sequential mining: open source code. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, pp. 26–35 (2005) Fournier-Viger, P., Faghihi, U., Nkambou, R., Nguifo, E.M.: CMRules: mining sequential rules common to several sequences. Knowl. Based Syst. 25(1), 63–76 (2012) Fournier-Viger, P., Nkambou, R., Tseng, V.S.M.: RuleGrowth: mining sequential rules common to several sequences by pattern-growth. In: Proceedings of the 2011 ACM symposium on applied computing, pp. 956–961 (2011) Fournier-Viger, P., Wu, C.W., Tseng, V.S., Nkambou, R.: Mining sequential rules common to several sequences with the window size constraint. Adv. Artif. Intell. 299–304 (2012) Gouda, K., Hassaan, M., Zaki, M.J.: Prism: an effective approach for frequent sequence mining via prime-block encoding. J. Comput. Syst. Sci. 76(1), 88–102 (2010) Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.C. FreeSpan: frequent pattern-projected sequential pattern mining. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 355–359 (2000) Lo, D., Khoo, S.-C., Liu, C.: Efficient mining of recurrent rules from a sequence database. In: DASFAA 2008, LNCS vol. 4947, pp. 67–83 (2008) Lo, D., Khoo, S.C., Wong, L.: Non-redundant sequential rules—theory and algorithm. Inf. Syst. 34(4), 438–453 (2009) Masseglia, F., Cathala, F., Poncelet, P.: The PSP approach for mining sequential patterns. In: PKDD’98, Nantes, France, LNCS vol. 1510, pp. 176–184 (1998) Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Hsu, M.C.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004) Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining access patterns efficiently from web logs. In: Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’00), Kyoto, Japan, pp. 396–407 (2000) Spiliopoulou, M.: Managing interesting rules in sequence mining. In: Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery, Prague, Czech Republic, pp. 554–560 (1999) Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology, Avignon, France, LNCS, pp. 3–17 (1996) UCI Machine Learning Repository. http://www.ics.uci.edu/mlearn/MLRepository.html Van, T.T., Vo, B., Le, B.: Mining sequential rules based on prefix-tree. In: Proceedings of the 3rd Asian Conference on Intelligent Information and Database Systems, Daegu, Korea, pp. 147–156 (2011) Zaki, M.J.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1–2), 31–60 (2001)

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA