Probabilistic exact adaptive random forest for recurrent concepts in data streams

Ocean Wu1, Yun Sing Koh1, Gillian Dobbie1, Thomas Lacombe1
1School of Computer Science, The University of Auckland, Auckland, New Zealand

Tóm tắt

In order to adapt random forests to the dynamic nature of data streams, the state-of-the-art technique discards trained trees and grows new trees when concept drifts are detected. This is particularly wasteful when recurrent patterns exist. In this work, we introduce a novel framework called PEARL, which uses both an exact technique and a probabilistic graphical model with Lossy Counting, to replace drifted trees with relevant trees built in the past. The exact technique utilizes pattern matching to find the set of drifted trees that co-occurred in predictions in the past. Meanwhile, a probabilistic graphical model is being built to capture the tree replacements among recurrent concept drifts. Once the graphical model becomes stable, it replaces the exact technique and finds relevant trees in a probabilistic fashion. Further, Lossy Counting is applied to the graphical model which brings an added theoretical guarantee for both error rate and space complexity. We empirically show our technique outperforms baselines in terms of accuracy and kappa on both synthetic and real-world datasets.

Tài liệu tham khảo

Ahmadi, Z., Kramer, S.: Modeling recurring concepts in data streams: a graph-based framework. Knowl. Inf. Syst. 55(1), 15–44 (2018) Anderson, R., Koh, Y.S., Dobbie, G., Bifet, A.: Recurring concept meta-learning for evolving data streams. Expert Syst. Appl. 138, 112832 (2019) Ángel, A.M., Bartolo, G.J., Ernestina, M.: Predicting recurring concepts on data-streams by means of a meta-model and a fuzzy similarity function. Expert Syst. Appl. 46, 87–105 (2016) Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010) Bifet, A., Holmes, G., Pfahringer, B.: Leveraging bagging for evolving data streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 135–150. Springer, Berlin Heidelberg (2010) Bifet, A., Holmes, G., Pfahringer, B., Gavaldà, R.: Improving adaptive bagging methods for evolving data streams. In: Zhou, Z.H., Washio, T. (eds.) Advances in Machine Learning, pp. 23–37. Springer, Berlin Heidelberg (2009) Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001) Brzezinski, D., Stefanowski, J.: Combining block-based and online methods in learning ensembles from concept drifting data streams. Inf. Sci. 265, 50–67 (2014). https://doi.org/10.1016/j.ins.2013.12.011 Cano, A., Krawczyk, B.: Kappa updated ensemble for drifting data stream mining. Mach. Learn. 109, 175–218 (2020) Chen, K., Koh, Y.S., Riddle, P.: Proactive drift detection: Predicting concept drifts in data streams using probabilistic networks. In: IJCNN, pp. 780–787. IEEE (2016) Chiu, C.W., Minku, L.L.: Diversity-based pool of models for dealing with recurring concepts. In: 2018 IJCNN, pp. 1–8. IEEE (2018) Gama, J., Kosina, P.: Recurrent concepts in data streams classification. Knowl. Inf. Syst. 40(3), 489–507 (2014) Gao, Y., Chandra, S., Li, Y., Khan, L., Thuraisingham, B. M.: SACCOS: A semi–supervised framework for emerging class detection and concept drift adaption over data streams. In: IEEE Trans. Knowl. Data Eng. (2020). https://doi.org/10.1109/TKDE.2020.2993193 Ghomeshi, H., Gaber, M.M., Kovalchuk, Y.: Eacd: evolutionary adaptation to concept drifts in data streams. Data Min. Knowl. Disc. 33, 663–694 (2019) Gomes, H.M., Bifet, A., Read, J., Barddal, J.P., Enembreck, F., Pfharinger, B., Holmes, G., Abdessalem, T.: Adaptive random forests for evolving data stream classification. Mach. Learn. 106(9–10), 1469–1495 (2017) Gonçalves Jr., P.M., Barros, R.S.M.D.: RCD: A recurring concept drift framework. Pattern Recogn. Lett. 34(9), 1018–1025 (2013) Goyal, A., Daumé, H.: Lossy conservative update (lcu) sketch: Succinct approximate count storage. In: 25th AAAI (2011) Hu, J., Chen, J., Qin, X.: Algorithm of recurring concept drift base on main feature extraction. In: Proceedings of the 2019 5th International Conference on Computing and Artificial Intelligence, pp. 59–65 (2019) Koh, Y.S., Huang, D.T.J., Pearce, C., Dobbie, G.: Volatility drift prediction for transactional data streams. In: 2018 IEEE ICDM, pp. 1091–1096. IEEE (2018) Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Third IEEE International Conference on Data Mining, pp. 123–130 (2003) Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: A survey. Information Fusion 37, 132–156 (2017) Krawczyk, B., Woźniak, M.: One-class classifiers with incremental learning and forgetting for data streams with concept drift. Soft. Comput. 19, 3387–3400 (2015) Li, P., Wu, X., Xuegang, H.: Mining recurring concept drifts with limited labeled streaming data. J. Mach. Learn. Res. - Proc. Track 13, 241–252 (2010). https://doi.org/10.1145/2089094.2089105 Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th VLDB, VLDB ’02, pp. 346–357. VLDB Endowment (2002) Masud, M.M., Al-Khateeb, T.M., Khan, L., Aggarwal, C., Gao, J., Han, J., Thuraisingham, B.: Detecting recurring and novel classes in concept-drifting data streams. In: 2011 IEEE 11th ICDM, pp. 1176–1181. IEEE (2011) Moreira dos Reis, D., Maletzke, A., Silva, D.F., Batista, G.E.: Classifying and counting with recurrent contexts. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1983–1992 (2018) Wu, O., Koh, Y.S., Dobbie, G., Lacombe, T.: PEARL: Probabilistic exact adaptive random forest with lossy counting for data streams. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 17–30. Springer (2020) Yang, Y., Wu, X., Zhu, X.: Mining in anticipation for concept change: Proactive-reactive prediction in data streams. Data Min. Knowl. Disc. 13(3), 261–289 (2006)