Handling imbalanced datasets through Optimum-Path Forest

Knowledge-Based Systems - Tập 242 - Trang 108445 - 2022
Leandro Aparecido Passos1, Danilo S. Jodas1, Luiz C.F. Ribeiro1, Marco Akio2, Andre Nunes de Souza2, João Paulo Papa1
1Department of Computing, São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01, Bauru, 17033-360, Brazil
2Department of Electrical Engineering, São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01, Bauru, 17033-360, Brazil

Tài liệu tham khảo

More, 2021, Review of imbalanced data classification and approaches relating to real-time applications, 1 Kumar, 2021, Classification of imbalanced data: Review of methods and applications, 1099 Vuttipittayamongkol, 2021, On the class overlap problem in imbalanced data classification, Knowl. Based Syst., 212, 10.1016/j.knosys.2020.106631 Wang, 2021, Review of classification methods on unbalanced data sets, IEEE Access, 9, 64606, 10.1109/ACCESS.2021.3074243 O’Brien, 2019, A random forests quantile classifier for class imbalanced data, Pattern Recognit., 90, 232, 10.1016/j.patcog.2019.01.036 Chen, 2004, 1 Wang, 2021, Ponzi scheme detection via oversampling-based long short-term memory for smart contracts, Knowl.-Based Syst., 228, 10.1016/j.knosys.2021.107312 Jiang, 2021, A wind turbine frequent principal fault detection and localization approach with imbalanced data using an improved synthetic oversampling technique, Int. J. Electr. Power Energy Syst., 126, 10.1016/j.ijepes.2020.106595 Sleeman IV, 2021, Multi-class imbalanced big data classification on Spark, Knowl.-Based Syst., 212 Oksuz, 2021, Imbalance problems in object detection: A review, IEEE Trans. Pattern Anal. Mach. Intell., 43, 3388, 10.1109/TPAMI.2020.2981890 Jing, 2021, Multiset feature learning for highly imbalanced data classification, IEEE Trans. Pattern Anal. Mach. Intell., 43, 139, 10.1109/TPAMI.2019.2929166 Huang, 2020, Deep imbalanced learning for face recognition and attribute prediction, IEEE Trans. Pattern Anal. Mach. Intell., 42, 2781, 10.1109/TPAMI.2019.2914680 Kovács, 2019, An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets, Appl. Soft Comput., 83, 10.1016/j.asoc.2019.105662 Malhotra, 2019, An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data, Neurocomputing, 343, 120, 10.1016/j.neucom.2018.04.090 Cordón, 2018, Imbalance: Oversampling algorithms for imbalanced classification in R, Knowl.-Based Syst., 161, 329, 10.1016/j.knosys.2018.07.035 Sáez, 2016, Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets, Pattern Recognit., 57, 164, 10.1016/j.patcog.2016.03.012 Chawla, 2002, SMOTE: Synthetic minority over-sampling technique, J. Artificial Intelligence Res., 16, 321, 10.1613/jair.953 Han, 2005, Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning, 878 He, 2008, ADASYN: Adaptive synthetic sampling approach for imbalanced learning, 1322 Liang, 2020, LR-SMOTE — An improved unbalanced data set oversampling based on K-means and SVM, Knowl.-Based Syst., 196, 10.1016/j.knosys.2020.105845 Elyan, 2021, CDSMOTE: class decomposition and synthetic minority class oversampling technique for imbalanced-data classification, Neural Comput. Appl., 33, 2839, 10.1007/s00521-020-05130-z Barua, 2014, MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning, IEEE Trans. Knowl. Data Eng., 26, 405, 10.1109/TKDE.2012.232 Douzas, 2018, Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE, Inform. Sci., 465, 1, 10.1016/j.ins.2018.06.056 Douzas, 2017, Self-organizing map oversampling (SOMO) for imbalanced data set learning, Expert Syst. Appl., 82, 40, 10.1016/j.eswa.2017.03.073 Cohen, 2006, Learning from imbalanced data in surveillance of nosocomial infection, Artif. Intell. Med., 37, 7, 10.1016/j.artmed.2005.03.002 I. Mani, I. Zhang, kNN approach to unbalanced data distributions: a case study involving information extraction, in: Proceedings of Workshop on Learning from Imbalanced Datasets, Vol. 126, 2003. Hart, 1968, The condensed nearest neighbor rule (Corresp.), IEEE Trans. Inform. Theory, 14, 515, 10.1109/TIT.1968.1054155 Koziarski, 2020, Radial-based undersampling for imbalanced data classification, Pattern Recognit., 102, 10.1016/j.patcog.2020.107262 Cervellera, 2020, Voronoi tree models for distribution-preserving sampling and generation, Pattern Recognit., 97, 10.1016/j.patcog.2019.107002 Papa, 2009, Supervised pattern classification based on optimum-path forest, Int. J. Imaging Syst. Technol., 19, 120, 10.1002/ima.20188 Papa, 2012, Efficient supervised optimum-path forest classification for large datasets, Pattern Recognit., 45, 512, 10.1016/j.patcog.2011.07.013 Papa, 2017, Optimum-Path Forest based on k-connectivity: Theory and applications, Pattern Recognit. Lett., 87, 117, 10.1016/j.patrec.2016.07.026 Rocha, 2009, Data clustering as an optimum-path forest problem with applications in image analysis, Int. J. Imaging Syst. Technol., 19, 50, 10.1002/ima.20191 Guimarães, 2018, Intelligent network security monitoring based on optimum-path forest clustering, IEEE Netw., 33, 126, 10.1109/MNET.2018.1800151 Souza, 2020, A novel approach for optimum-path forest classification using fuzzy logic, IEEE Trans. Fuzzy Syst., 28, 3076, 10.1109/TFUZZ.2019.2949771 Rosa, 2014, On the training of artificial neural networks with radial basis function using optimum-path forest clustering, 1472 Afonso, 2018, Enhancing brain storm optimization through optimum-path forest, 000183 Passos, 2020, O2PF: Oversampling via optimum-path forest for breast cancer detection, 498 Fernandes, 2019, Improving optimum-path forest learning using bag-of-classifiers and confidence measures, Pattern Anal. Appl., 22, 703, 10.1007/s10044-017-0677-9 Dua, 2017 Duval, 2001, Interpretation of gas-in-oil analysis using new IEC publication 60599 and IEC TC 10 databases, IEEE Electr. Insul. Mag., 17, 31, 10.1109/57.917529 Lupi Filho, 2012 Ghoneim, 2012, Dissolved gas analysis as a diagnostic tools for early detection of transformer faults, Adv. Electr. Eng. Syst., 1, 152 Soni, 2016, An approach to diagnose incipient faults of power transformer using dissolved gas analysis of mineral oil by ratio methods using fuzzy logic, 1894 Equbal, 2018, Transformer incipient fault diagnosis on the basis of energy-weighted DGA using an artificial neural network, Turk. J. Electr. Eng. Comput. Sci., 26, 77, 10.3906/elk-1704-229 Wilcoxon, 1945, Individual comparisons by ranking methods, Biom. Bull., 1, 80, 10.2307/3001968 de Rosa, 2020 de Rosa, 2021, OPFython: A Python implementation for Optimum-Path Forest, Softw. Impacts, 9, 10.1016/j.simpa.2021.100113 Maaten, 2008, Visualizing data using t-SNE, J. Mach. Learn. Res., 9, 2579