New boosting approaches for improving cluster-based undersampling in problems with imbalanced data

Decision Analytics Journal - Tập 8 - Trang 100316 - 2023
Abdullah-All-Tanvir1, Iftakhar Ali Khandokar1, Swakkhar Shatabda1,2
1Department of Computer Science and Engineering, United International University Plot-2, United City, Madani Avenue, Badda, Dhaka 1212, Bangladesh
2Center for Artificial Intelligence and Robotics (CAIR), United International University Plot-2, United City, Madani Avenue, Badda, Dhaka 1212, Bangladesh

Tài liệu tham khảo

Mondal, 2021, Handling imbalanced data for credit card fraud detection, 1 Sakar, 2019, Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and lstm recurrent neural networks, Neural Comput. Appl., 31, 6893, 10.1007/s00521-018-3523-0 Chowdhury, 2017, Idnaprot-es: Identification of DNA-binding proteins using evolutionary and structural features, Sci. Rep., 7, 14938, 10.1038/s41598-017-14945-1 Muhammod, 2019, Pyfeat: a python-based effective feature generation tool for DNA, RNA and protein sequences, Bioinformatics, 35, 3831, 10.1093/bioinformatics/btz165 Rayhan, 2017, idti-esboost: Identification of drug target interaction using evolutionary and structural features with boosting, Sci. Rep., 7, 17731, 10.1038/s41598-017-18025-2 Dehzangi, 2022, iprotgly-ss: A tool to accurately predict protein glycation site using structural-based features, 125 Progga, 2022, iressenet: An accurate convolutional neural network for retinal blood vessel segmentation, 567 Ahmad, 2020, Enhanced prediction of lysine propionylation sites using bi-peptide evolutionary features resolving data imbalance, 1668 Arafat, 2020, Accurately predicting glutarylation sites using sequential bi-peptide-based evolutionary features, Genes, 11, 1023, 10.3390/genes11091023 Rayhan, 2019, Cfsboost: cumulative feature subspace boosting for drug-target interaction prediction, J. Theoret. Biol., 464, 1, 10.1016/j.jtbi.2018.12.024 Rayhan, 2017, Cusboost: Cluster-based under-sampling with boosting for imbalanced classification, 1 Islam, 2018, iprotgly-ss: Identifying protein glycation sites using sequence and structure based features, Proteins Struct. Funct. Bioinform., 86, 777, 10.1002/prot.25511 Koziarski, 2020, Radial-based undersampling for imbalanced data classification, Pattern Recognit., 102, 10.1016/j.patcog.2020.107262 Tsai, 2019, Under-sampling class imbalanced datasets by combining clustering analysis and instance selection, Inform. Sci., 477, 47, 10.1016/j.ins.2018.10.029 Krawczyk, 2019, Radial-based oversampling for multiclass imbalanced data classification, IEEE Trans. Neural Netw. Learn. Syst., 31, 2818, 10.1109/TNNLS.2019.2913673 Liu, 2019, Deepbalance: Deep-learning and fuzzy oversampling for vulnerability detection, IEEE Trans. Fuzzy Syst., 28, 1329 Lin, 2017, Clustering-based undersampling in class-imbalanced data, Inform. Sci., 409, 17, 10.1016/j.ins.2017.05.008 Yen, 2009, Cluster-based under-sampling approaches for imbalanced data distributions, Expert Syst. Appl., 36, 5718, 10.1016/j.eswa.2008.06.108 Saha, 2022, Cluster-oriented instance selection for classification problems, Inform. Sci., 602, 143, 10.1016/j.ins.2022.04.036 Khandokar, 2022, A clustering based priority driven sampling technique for imbalance data classification, 176 Liu, 2020, Dealing with class imbalance in classifier chains via random undersampling, Knowl.-Based Syst., 192, 10.1016/j.knosys.2019.105292 Rekha, 2019, Cluster-based under-sampling using farthest neighbour technique for imbalanced datasets, 35 Minlong Peng, Qi Zhang, Xiaoyu Xing, Tao Gui, Xuanjing Huang, Yu-Gang Jiang, Keyu Ding, Zhigang Chen, Trainable undersampling for class-imbalance learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, Volume 33-01, 2019, pp. 4707–4714. Zhang, 2019, Evolutionary-based ensemble under-sampling for imbalanced data, 212 Huang, 2019, Deep imbalanced learning for face recognition and attribute prediction, IEEE Trans. Pattern Anal. Mach. Intell., 42, 2781, 10.1109/TPAMI.2019.2914680 Ng, 2014, Diversified sensitivity-based undersampling for imbalance classification problems, IEEE Trans. Cybern., 45, 2402, 10.1109/TCYB.2014.2372060 Nwe, 2019, Knn-based overlapping samples filter approach for classification of imbalanced data, 55 Devi, 2020, A boosting-aided adaptive cluster-based undersampling approach for treatment of class imbalance problem, Int. J. Data Warehous. Min. (IJDWM), 16, 60, 10.4018/IJDWM.2020070104 Zhang, 2019, Undersampling near decision boundary for imbalance problems, 1 Le, 2018, A cluster-based boosting algorithm for bankruptcy prediction in a highly imbalanced dataset, Symmetry, 10, 250, 10.3390/sym10070250 Vuttipittayamongkol, 2020, Neighbourhood-based undersampling approach for handling imbalanced and overlapped data, Inform. Sci., 509, 47, 10.1016/j.ins.2019.08.062 Wang, 2020, Entropy and confidence-based undersampling boosting random forests for imbalanced problems, IEEE Trans. Neural Netw. Learn. Syst., 31, 5178, 10.1109/TNNLS.2020.2964585 Chawla, 2002, Smote: synthetic minority over-sampling technique, J. Artif. Intell. Res., 16, 321, 10.1613/jair.953 Vishwakarma, 2022, Dids: A deep neural network based real-time intrusion detection system for IoT, Decis. Anal. J., 5 Moreno-Garcia, 2023, A novel application of machine learning and zero-shot classification methods for automated abstract screening in systematic reviews, Decis. Anal. J., 10.1016/j.dajour.2023.100162 Afriyie, 2023, A supervised machine learning algorithm for detecting and predicting fraud in credit card transactions, Decis. Anal. J., 6 Song, 2016, A bi-directional sampling based on k-means method for imbalance text classification, 1 Shangguan, 2021, Abnormal samples oversampling for anomaly detection based on uniform scale strategy and closed area, IEEE Trans. Knowl. Data Eng. Yuxin Peng, Adaptive sampling with optimal cost for class-imbalance learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, Volume 29, 2015, p. 1. Xu, 2021, A cluster-based oversampling algorithm combining smote and k-means for imbalanced medical data, Inform. Sci., 572, 574, 10.1016/j.ins.2021.02.056 Shi, 2020, Fault diagnosis of an autonomous vehicle with an improved svm algorithm subject to unbalanced datasets, IEEE Trans. Ind. Electron., 68, 6248, 10.1109/TIE.2020.2994868 Bennin, 2017, Mahakil: Diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction, IEEE Trans. Softw. Eng., 44, 534, 10.1109/TSE.2017.2731766 Tao, 2020, Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering, Inform. Sci., 519, 43, 10.1016/j.ins.2020.01.032 Chen, 2021, A hybrid data-level ensemble to enable learning from highly imbalanced dataset, Inform. Sci., 554, 157, 10.1016/j.ins.2020.12.023 Zhang, 2018, An approach to class imbalance problem based on stacking and inverse random under sampling methods, 1 Lingkai Yang, Yinan Guo, Jian Cheng, Manifold distance-based over-sampling technique for class imbalance learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, Volume 33:01, 2019, pp. 10071–10072. Li, 2021, A binary pso-based ensemble under-sampling model for rebalancing imbalanced training data, J. Supercomput., 1 Ahmed, 2019, Liuboost: locality informed under-boosting for imbalanced data classification, 133 Tao, 2019, Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification, Inform. Sci., 487, 31, 10.1016/j.ins.2019.02.062 Lee, 2021, Gan-based imbalanced data intrusion detection system, Pers. Ubiquitous Comput., 25, 121, 10.1007/s00779-019-01332-y Zhou, 2020, Deep learning fault diagnosis method based on global optimization gan for unbalanced data, Knowl.-Based Syst., 187, 10.1016/j.knosys.2019.07.008 Jinfu Ren, Yang Liu, Jiming Liu, Ewgan: Entropy-based wasserstein gan for imbalanced learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, Volume 33:01, 2019, pp. 10011–10012. Thejas, 2022, An extension of synthetic minority oversampling technique based on kalman filter for imbalanced datasets, Mach. Learn. Appl., 8 Ahmed, 2022, Predicting severely imbalanced data disk drive failures with machine learning models, Mach. Learn. Appl., 9 Temraz, 2022, Solving the class imbalance problem using a counterfactual method for data augmentation, Mach. Learn. Appl., 9 Freund, 1996, Experiments with a new boosting algorithm, 148