Mining frequent itemsets from streaming transaction data using genetic algorithms

Journal of Big Data - Tập 7 - Trang 1-20 - 2020
Sikha Bagui1, Patrick Stanley1
1Department of Computer Science, University of West Florida , Pensacola, USA

Tóm tắt

This paper presents a study of mining frequent itemsets from streaming data in the presence of concept drift. Streaming data, being volatile in nature, is particularly challenging to mine. An approach using genetic algorithms is presented, and various relationships between concept drift, sliding window size, and genetic algorithm constraints are explored. Concept drift is identified by changes in frequent itemsets. The novelty of this work lies in determining concept drift using frequent itemsets for mining streaming data, using the genetic algorithm framework. Formulas have been presented for calculating minimum support counts in streaming data using sliding windows. Testing highlighted that the ratio of the window size to transactions per drift was a key to good performance. Getting good results when the sliding window size was too small was a challenge since normal fluctuations in the data could appear to be a concept drift. Window size must be managed in conjunction with support and confidence values in order to achieve reasonable results. This method of detecting concept drift performed well when larger window sizes were used.

Tài liệu tham khảo

Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD international conference on management of data, Washington, D.C., USA. 1993. pp. 207–216. Aldallal AS. Avoiding premature convergence of GA in informational retrieval systems. Int J Intell Syst Appl Eng. 2015;2(4):80. https://doi.org/10.18201/ijisae.78975. Angelova M, Pencheva T. Tuning GA parameters to improve convergence time. Int J Chem Eng. 2011. https://doi.org/10.1155/2011/646917. Bull AD. Convergence rates of efficient global optimization algorithms. J Mach Learn Res. 2011;12(88):2879–904. Forrest S, Mitchell M. What makes a problem hard for a GA? Some anomalous results and their explanation. GAs Mach Learn. 1993. https://doi.org/10.1007/978-1-4615-2740-4_6. Fortin F-A, De Rainville F-M, Gardner M-A, Parizeau M, Gagné C. DEAP: evolutionary algorithms made easy. J Mach Learn Res. 2012;13:2171–5. Gama J. A survey on learning from data streams: current and future trends. Prog Arti Intell. 2012;1(1):45–55. https://doi.org/10.1007/s13748-011-0002-6. Ghosh S, Biswas S, Sarkar D, Sarkar PP. Mining frequent itemsets using GA. Int J Artif Intell Appl. 2010;1(4):133–43. https://doi.org/10.5121/ijaia.2010.1411. He J, Lin G. Average convergence rate of evolutionary algorithms. IEEE Trans Evol Comput. 2016;20(2):316–21. https://doi.org/10.1109/tevc.2015.2444793. Hoens TR, Polikar R, Chawla NV. Learning from streaming data with concept drift and imbalance: an overview. Prog Artif Intell. 2012;1(1):89–101. https://doi.org/10.1007/s13748-011-0008-0. Kim Y, Park CH. An efficient concept drift detection method for streaming data under limited labeling. IEICE Trans Inf Syst. 2017;100(10):2537–46. https://doi.org/10.1587/transinf.2017edp7091. Kolajo T, Daramola O, Adebiyi A. Big data stream analysis: a systematic literature review. J Big Data. 2019;6:47. https://doi.org/10.1186/s40537-019-0210-7. Krempl G, Zliobaite I, Brzezinski D, Hullermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M, Stefanowski J. Open challenges for data stream mining research. ACM SIGKDD Explor Newsl. 2014;16(1):1–10. https://doi.org/10.1145/2674026.2674028. Lin W-Y, Lee W-Y, Hong T-P. Adapting crossover and mutation rates in GAs. J Inf Sci Eng. 2003;19(5):889–903. Nedunchezhian R, Geethanindhini K. Association rule mining on Big Data—a survey. Int J Eng Res Technol. 2016;5(5):42–6. Pellerin E, Pigeon L, Delisle S. Self-adaptive parameters in Genetic Algorithms. In: Proceeings SPIE 5433. Data mining and knowledge discovery: theory, tools, and technology VI. 2004. https://doi.org/10.1117/12.542156. Rabinovich Y, Wigderson A. Techniques for bounding the convergence rate of GAs. Random Struct Algorithms. 1999;14(2):111–38. https://doi.org/10.1002/(sici)1098-2418(199903)14:2%3c111:aid-rsa1%3e3.0.co;2-6. Rangaswamy S, Shobha G. Optimized association rule mining using GA. J Comput Sci Eng Inf Technol Res. 2012;2(1):1–9. Ruholla J-M, Smith BK. Fluid GA (FGA). J Comput Des Eng. 2017;4(2):158–67. https://doi.org/10.1016/j.jcde.2017.03.001. Vijayarani S, Sathya P. An efficient algorithm for mining frequent items in data streams. Int J Innov Res Comput Commun Eng. 2013;1(3):742–7. Wang H, Abraham Z. Concept drift detection for streaming data. In: 2015 international joint conference on neural networks (IJCNN). 2015. https://doi.org/10.1109/ijcnn.2015.7280398. Yu PS, Chi Y. Association rule mining on streams. Encyclopedia of Database Systems. 2009. https://doi.org/10.1007/springerreference_63188.