Học tập chủ động rộng rãi với tiến hóa đa mục tiêu cho phân loại dữ liệu dòng

Complex & Intelligent Systems - Trang 1-18 - 2023
Jian Cheng1,2, Zhiji Zheng3, Yinan Guo3,4,5, Jiayang Pu5, Shengxiang Yang6
1Research Institute of Mine Big Data, China Coal Research Institute, Beijing, People’s Republic of China
2Tiandi Science and Technology Co., Ltd, Beijing, People’s Republic of China
3School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, People’s Republic of China
4Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, People’s Republic of China
5School of Mechanical Electronic and Information Engineering, China University of Mining and Technology (Beijing), Beijing, People’s Republic of China
6School of Computer Science and Informatics, De Montfort University, The Gateway, Leicester, UK

Tóm tắt

Trong một môi trường phát trực tuyến, các đặc điểm và nhãn của các phiên bản có thể thay đổi theo thời gian, tạo thành các biến đổi khái niệm. Các nghiên cứu trước đây về học tập dòng dữ liệu thường giả định rằng nhãn thật của mỗi phiên bản có sẵn hoặc dễ dàng thu được, điều này không thực tế trong nhiều ứng dụng thực tiễn do chi phí thời gian và lao động tốn kém cho việc gán nhãn. Để giải quyết vấn đề này, một phương pháp học tập chủ động rộng rãi dựa trên tối ưu hóa tiến hóa đa mục tiêu được trình bày để phân loại dòng dữ liệu không tĩnh. Mỗi phiên bản mới đến tại mỗi bước thời gian sẽ được lưu trữ vào một khối theo thứ tự. Khi khối đầy đủ, phân bố dữ liệu của nó sẽ được so sánh với các phân bố trước đó thông qua phát hiện biến đổi cấp địa phương nhanh chóng để tìm kiếm biến đổi khái niệm tiềm năng. Với việc tính đến sự đa dạng của các phiên bản và sự liên quan của chúng tới khái niệm mới, một thuật toán tiến hóa đa mục tiêu được giới thiệu để tìm kiếm các phiên bản ứng cử viên có giá trị nhất. Trong số đó, các phiên bản đại diện được chọn ngẫu nhiên để truy vấn nhãn thật của chúng, và sau đó cập nhật mô hình học tập rộng rãi cho việc thích ứng với biến đổi. Đặc biệt, số lượng các phiên bản đại diện được xác định bởi sự ổn định của các khối lịch sử liền kề. Kết quả thực nghiệm cho 7 tập dữ liệu tổng hợp và 5 tập dữ liệu thực tế cho thấy phương pháp đề xuất vượt trội hơn năm phương pháp tiên tiến nhất về độ chính xác phân loại và chi phí gán nhãn nhờ vào việc xác định chính xác các vùng biến đổi và ngân sách gán nhãn được điều chỉnh linh hoạt.

Từ khóa

#học tập dòng dữ liệu #biến đổi khái niệm #tối ưu hóa tiến hóa #phân loại dữ liệu #mô hình học tập chủ động

Tài liệu tham khảo

Lu J, Liu A, Dong F, Gu F, Gama J, Zhang G (2018) Learning under concept drift: a review. IEEE Trans Knowl Data Eng 31(12):2346–2363. https://doi.org/10.1109/TKDE.2018.2876857 Jiao B, Guo Y, Gong D, Chen Q (2022) Dynamic ensemble selection for imbalanced data streams with concept drift. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2022.3183120 Lu J, Liu A, Song Y, Zhang G (2020) Data-driven decision support under concept drift in streamed big data. Complex Intell Syst 6(1):157–163. https://doi.org/10.1007/s40747-019-00124-4 Fahy C, Yang S, Gongora M (2021) Classification in dynamic data streams with a scarcity of labels. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2021.3135755 Lu Y, Cheung YM, Tang YY (2017) Dynamic weighted majority for incremental learning of imbalanced data streams with concept drift. In: IJCAI, pp 2393–2399 Liao G, Zhang P, Yin H, Deng X, Li Y, Zhou H, Zhao D (2023) A novel semi-supervised classification approach for evolving data streams. Expert Syst Appl 215:119273. https://doi.org/10.1109/TFUZZ.2021.3128210 Settles B (2012) Active learning. Synthesis lectures on artificial intelligence and machine learning, vol 6, no 1, pp 1–114. https://doi.org/10.2200/S00429ED1V01Y201207AIM018 Carr R, Palmer S, Hagel P (2015) Active learning: the importance of developing a comprehensive measure. Act Learn High Educ 16(3):173–186. https://doi.org/10.1177/1469787415589529 Zhu X, Zhang P, Lin X, Shi Y (2010) Active learning from stream data using optimal weight classifier ensemble. IEEE Trans Syst Man Cybern Part B (Cybernetics) 40(6):1607–1621. https://doi.org/10.1109/TSMCB.2010.2042445 Shan J, Zhang H, Liu W, Liu Q (2018) Online active learning ensemble framework for drifted data streams. IEEE Trans Neural Netw Learn Syst 30(2):486–498. https://doi.org/10.1109/TNNLS.2018.2844332 Kamilaris A, Prenafeta-Boldú FX (2018) Deep learning in agriculture: a survey. Comput Electron Agric 147:70–90. https://doi.org/10.1016/j.compag.2018.02.016 Priya S, Uthra RA (2021) Deep learning framework for handling concept drift and class imbalanced complex decision-making on streaming data. Complex Intell Syst. https://doi.org/10.1007/s40747-021-00456-0 Chen CP, Liu Z (2017) Broad learning system: an effective and efficient incremental learning system without the need for deep architecture. IEEE Trans Neural Netw Learn Syst 29(1):10–24. https://doi.org/10.1109/TNNLS.2017.2716952 Gong X, Zhang T, Chen CP, Liu Z (2021) Research review for broad learning system: algorithms, theory, and applications. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2021.3061094 Jiao B, Guo Y, Yang S, Pu J, Gong D (2022) Reduced-space multistream classification based on multi-objective evolutionary optimization. IEEE Trans Evol Comput. https://doi.org/10.1109/TEVC.2022.3232466 Brzezinski D, Stefanowski J (2013) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans Neural Netw Learn Syst 25(1):81–94. https://doi.org/10.1109/TNNLS.2013.2251352 Jiao B, Guo Y, Yang C, Pu J, Zheng Z, Gong D (2022) Incremental weighted ensemble for data streams with concept drift. IEEE Trans Artif Intell. https://doi.org/10.1109/TAI.2022.3224416 Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: 4th international workshop on knowledge discovery from data streams, vol 6, pp 77–86 Ross GJ, Adams NM, Tasoulis DK, Hand DJ (2012) Exponentially weighted moving average charts for detecting concept drift. Pattern Recognit Lett 33(2):191–198. https://doi.org/10.1016/j.patrec.2011.08.019 Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp 443–448. https://doi.org/10.1137/1.9781611972771.42 Liu A, Song Y, Zhang G, Lu J (2017) Regional concept drift detection and density synchronized drift adaptation. In: IJCAI international joint conference on artificial intelligence. http://hdl.handle.net/10453/126374 Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp 377–382. https://doi.org/10.1145/502512.502568 Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531. https://doi.org/10.1109/TNN.2011.2160459 Brzezinski D, Stefanowski J (2013) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans Neural Netw Learn Syst 25(1):81–94. https://doi.org/10.1109/TNNLS.2013.2251352 Huang H, Zhang T, Yang C, Chen CP (2019) Motor learning and generalization using broad learning adaptive neural control. IEEE Trans Ind Electron 67(10):8608–8617. https://doi.org/10.1109/TIE.2019.2950853 Jin JW, Chen CP (2018) Regularized robust broad learning system for uncertain data modeling. Neurocomputing 322:58–69. https://doi.org/10.1016/j.neucom.2018.09.028 Feng S, Chen CP (2018) Fuzzy broad learning system: a novel neuro-fuzzy model for regression and classification. IEEE Trans Cybern 50(2):414–424. https://doi.org/10.1109/TCYB.2018.2857815 Zhang D, Yang H, Chen P, Li T (2019) A face recognition method based on broad learning of feature block. In: 2019 IEEE 9th annual international conference on cyber technology in automation, control, and intelligent systems (CYBER). IEEE, pp 307–310 Dang Y, Yang F, Yin J (2020) DWnet: deep-wide network for 3D action recognition. Robot Auton Syst 126:103441 Zhao H, Zheng J, Xu J, Deng W (2019) Fault diagnosis method based on principal component analysis and broad learning system. IEEE Access 7:99263–99272 Wang M, Ge Q, Jiang H, Yao G (2019) Wear fault diagnosis of aeroengines based on broad learning system and ensemble learning. Energies 12(24):4750 Wang XH, Zhang T, Xu XM, Chen L, Xing XF, Chen CP (2018) EEG emotion recognition using dynamical graph convolutional neural networks and broad learning system. In: 2018 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 1240–1244 Yang Y, Gao Z, Li Y, Cai Q, Marwan N, Kurths J (2019) A complex network-based broad learning system for detecting driver fatigue from EEG signals. IEEE Trans Syst Man Cybern Syst 51(9):5800–5808 Kordos M, Blachnik M (2012) Instance selection with neural networks for regression problems. In: International conference on artificial neural networks, pp 263–270. https://doi.org/10.1007/978-3-642-33266-1_33 Arnaiz-González Á, Díez-Pastor JF, Rodríguez JJ, García-Osorio C (2016) Instance selection for regression: adapting DROP. Neurocomputing 201:66–81. https://doi.org/10.1016/j.neucom.2016.04.003 Yinan G, Chen G, Jiang M, Gong D, Liang J (2022) A knowledge guided transfer strategy for evolutionary dynamic multiobjective optimization. IEEE Trans Evolut Comput. https://doi.org/10.1109/TEVC.2022.3222844 Tolvi J (2004) Genetic algorithms for outlier detection and variable selection in linear regression models. Soft Comput 8(8):527–533. https://doi.org/10.1007/s00500-003-0310-2 García-Pedrajas N, Romero del Castillo JA, Ortiz-Boyer D (2010) A cooperative coevolutionary algorithm for instance selection for instance-based learning. Mach Learn 78(3):381–420. https://doi.org/10.1007/s10994-009-5161-3 Guo YN, Zhang X, Gong DW, Zhang Z, Yang JJ (2019) Novel interactive preference-based multiobjective evolutionary optimization for bolt supporting networks. IEEE Trans Evolut Comput 24(4):750–764 Chen G, Guo Y, Huang M, Gong D, Yu Z (2022) A domain adaptation learning strategy for dynamic multiobjective optimization. Inf Sci. https://doi.org/10.1016/j.ins.2022.05.050 Rosales-Pérez A, García S, Gonzalez JA, Coello CAC, Herrera F (2017) An evolutionary multiobjective model and instance selection for support vector machines with pareto-based ensembles. IEEE Trans Evolut Comput 21(6):863–877. https://doi.org/10.1109/TEVC.2017.2688863 Guo Y, Zhang Z, Tang F (2021) Feature selection with kernelized multi-class support vector machine. Pattern Recognit 117:107988. https://doi.org/10.1016/j.patcog.2021.107988 Escalante HJ, Marin-Castro M, Morales-Reyes A, Graff M, Rosales-Pérez A, Montes-y-Gómez M, Gonzalez JA et al (2017) MOPG: a multi-objective evolutionary algorithm for prototype generation. Pattern Anal Appl 20(1):33–47. https://doi.org/10.1007/s10044-015-0454-6 Kordos M, Łapa K (2018) Multi-objective evolutionary instance selection for regression tasks. Entropy 20(10):746. https://doi.org/10.3390/e20100746 Korycki L, Krawczyk B (2019) Unsupervised drift detector ensembles for data stream mining. In: 2019 IEEE international conference on data science and advanced analytics (DSAA), pp 317–325. https://doi.org/10.1109/DSAA.2019.00047 Xu H, Deng Y (2017) Dependent evidence combination based on Shearman coefficient and Pearson coefficient. IEEE Access 6:11634–11640. https://doi.org/10.1109/ACCESS.2017.2783320 Zhou X, Liu Y, Li B, Sun G (2015) Multiobjective biogeography based optimization algorithm with decomposition for community detection in dynamic networks. Phys A 436:430–442. https://doi.org/10.1016/j.physa.2015.05.069 Ren S, Liao B, Zhu W, Li Z, Liu W, Li K (2018) The gradual resampling ensemble for mining imbalanced data streams with concept drift. Neurocomputing 286:150–166. https://doi.org/10.1016/j.neucom.2018.01.063 Lu Y, Cheung YM, Tang YY (2017) Dynamic weighted majority for incremental learning of imbalanced data streams with concept drift. In: IJCAI, pp 2393–2399 Liu A, Lu J, Zhang G (2020) Diverse instance-weighting ensemble based on region drift disagreement for concept drift adaptation. IEEE Trans Neural Netw Learn Syst 32(1):293–307 Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604 Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790 Liu A, Lu J, Zhang G (2020) Diverse instance-weighting ensemble based on region drift disagreement for concept drift adaptation. IEEE Trans Neural Netw Learn Syst 32(1):293–307 Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):1–37. https://doi.org/10.1145/2523813 Brzezinski D, Stefanowski J (2014) Combining block-based and online methods in learning ensembles from concept drifting data streams. Inf Sci 265:50–67. https://doi.org/10.1016/j.ins.2013.12.011 Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F, Pfharinger B, Abdessalem T (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106(9):1469–1495. https://doi.org/10.1007/s10994-017-5642-8 Santos SGTDC, Gonçalves Júnior PM, Silva GDDS, Barros RSMD (2014) Speeding up recovery from concept drifts. In: Joint European conference on machine learning and knowledge discovery in databases, pp 179–194. https://doi.org/10.1007/978-3-662-44845-8_12 Kayvanfar V, Zandieh M, Arashpour M (2022) Hybrid bi-objective economic lot scheduling problem with feasible production plan equipped with an efficient adjunct search technique. Int J Syst Sci Oper Logist 1–24 Wheeb AH (2017) Performance analysis of VoIP in wireless networks. Int J Comput Netw Wirel Commun (IJCNWC) 7(4):1–5