Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo
Mô hình Tập hợp của Light Gradient Boosting Machine và Adaboost để Dự đoán Bệnh Tiểu Đường Loại 2
Tóm tắt
Học máy giúp xây dựng các mô hình dự đoán trong phân tích dữ liệu lâm sàng, dự đoán giá cổ phiếu, nhận diện hình ảnh, mô hình tài chính, dự đoán bệnh tật và chẩn đoán. Bài báo này đề xuất các thuật toán học máy tập hợp để dự đoán bệnh tiểu đường. Tập hợp bao gồm k-NN, Naive Bayes (Gaussian), Random Forest (RF), Adaboost và một máy Light Gradient Boosting được thiết kế gần đây. Các tập hợp được đề xuất kế thừa khả năng phát hiện của LightGBM để nâng cao độ chính xác. Dưới sự kiểm định chéo năm lần, các mô hình tập hợp được đề xuất thể hiện hiệu suất tốt hơn so với các mô hình gần đây khác. k-NN, Adaboost và LightGBM đồng thời đạt được độ chính xác phát hiện 90,76%. Phân tích đường cong hoạt động của người nhận cho thấy k-NN, RF và LightGBM giải quyết thành công vấn đề mất cân bằng lớp của tập dữ liệu cơ sở.
Từ khóa
#học máy #dự đoán bệnh tiểu đường #thuật toán tập hợp #độ chính xác #mất cân bằng lớpTài liệu tham khảo
Kopitar, L., Kocbek, P., Cilar, L., Sheikh, A., Stiglic, G.: Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Sci. Rep. 10(1), 1–12 (2020)
Cho, N.H., et al.: IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045. Diabetes Res. Clin. Pract. 138, 271–281 (2018). https://doi.org/10.1016/j.diabres.2018.02.023
Khandakar, A., et al.: A machine learning model for early detection of diabetic foot using thermogram images. Comput. Biol. Med. 137, 104838 (2021). https://doi.org/10.1016/j.compbiomed.2021.104838
Chaki, J., Thillai Ganesh, S., Cidham, S.K., Ananda Theertan, S.: Machine learning and artificial intelligence based diabetes mellitus detection and self-management: a systematic review. J. King Saud Univ. - Comput. Inf. Sci. (2020). https://doi.org/10.1016/j.jksuci.2020.06.013
Islam, M.M.F., Ferdousi, R., Rahman, S., Bushra, H.Y.: Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques, pp. 113–125. Springer, Singapore (2020)
Mercaldo, F., Nardone, V., Santone, A.: Diabetes mellitus affected patients classification and diagnosis through machine learning techniques. Procedia Comput. Sci. 112, 2519–2528 (2017). https://doi.org/10.1016/j.procs.2017.08.193
Yuvaraj, N., SriPreethaa, K.R.: Diabetes prediction in healthcare systems using machine learning algorithms on Hadoop cluster. Cluster Comput. 22(1), 1–9 (2019)
Negi, A., Jaiswal, V.: A first attempt to develop a diabetes prediction method based on different global datasets, In: 2016 Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC), 2016, pp. 237–241 (2016). https://doi.org/10.1109/PDGC.2016.7913152
Maniruzzaman, M., Rahman, M.J., Ahammed, B., Abedin, M.M.: Classification and prediction of diabetes disease using machine learning paradigm. Heal. Inf. Sci. Syst. 8(1), 7 (2020). https://doi.org/10.1007/s13755-019-0095-z
Tafa, Z., Pervetica, N., Karahoda, B.: An intelligent system for diabetes prediction. In: 2015 4th Mediterranean Conference on Embedded Computing (MECO), pp. 378–382 (2015)
Labhade, J.D., Chouthmol, L.K., Deshmukh, S.: Diabetic retinopathy detection using soft computing techniques. In: 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT), pp. 175–178 (2016). https://doi.org/10.1109/ICACDOT.2016.7877573.
Saxena, R.: Role of k-nearest neighbour in detection of diabetes mellitus. Turk. J. Comput. Math. Educ. 12(10), 373–376 (2021)
Benbelkacem, S., Atmani, B.: Random forests for diabetes diagnosis. In: 2019 International Conference on Computer and Information Sciences (ICCIS), pp. 1–4 (2019)
Washburn, P.S.: Investigation of severity level of diabetic retinopathy using adaboost classifier algorithm. Mater. Today Proc. 33, 3037–3042 (2020)
Rufo, D.D., Debelee, T.G., Ibenthal, A., Negera, W.G.: Diagnosis of diabetes mellitus using gradient boosting machine (LightGBM). Diagnostics 11(9), 1714 (2021)
Alharbi, A., Alghahtani, M.: Using genetic algorithm and ELM neural networks for feature extraction and classification of type 2-diabetes mellitus. Appl. Artif. Intell. 33(4), 311–328 (2019). https://doi.org/10.1080/08839514.2018.1560545
Chaising, S., Temdee, P., Prasad, R.: Weighted objective distance for the classification of elderly people with hypertension. Knowledge-Based Syst. 210, 106441 (2020)
Nuankaew, P., Chaising, S., Temdee, P.: Average weighted objective distance-based method for type 2 diabetes prediction. IEEE Access 9, 137015–137028 (2021). https://doi.org/10.1109/ACCESS.2021.3117269
Cao, K., Xiao, Y., Hou, M.: Correlation-driven framework based on graph convolutional network for clinical disease classification. J. Stat. Comput. Simul. 91(15), 3108–3124 (2021). https://doi.org/10.1080/00949655.2021.1921777
Syed, A.H., Khan, T.: Machine learning-based application for predicting risk of Type 2 Diabetes Mellitus (T2DM) in Saudi Arabia: a retrospective cross-sectional study. IEEE Access 8, 199539–199561 (2020)
Christo, V.R.E., Nehemiah, H.K., Brighty, J., Kannan, A.: Feature selection and instance selection from clinical datasets using co-operative co-evolution and classification using random forest. IETE J. Res. 68(4), 1–14 (2020)
Mishra, S., Tripathy, H.K., Mallick, P.K., Bhoi, A.K., Barsocchi, P.: EAGA-MLP—an enhanced and adaptive hybrid classification model for diabetes diagnosis. Sensors 20(14), 4036 (2020)
Sathurthi, S., Saruladha, K.: An analysis of parallel ensemble diabetes decision support system based on voting classifier for classification problem. Electron. Gov. an Int. J. 16(1–2), 25–38 (2020)
Ismail, L., Materwala, H., Tayefi, M., Ngo, P., Karduck, A.P.: Type 2 diabetes with artificial intelligence machine learning: methods and evaluation. Arch. Comput. Methods Eng. 29(1), 313–333 (2022). https://doi.org/10.1007/s11831-021-09582-x
Kumari, S., Kumar, D., Mittal, M.: An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int. J. Cogn. Comput. Eng. 2, 40–46 (2021). https://doi.org/10.1016/j.ijcce.2021.01.001
Rajendra, P., Latifi, S.: Prediction of diabetes using logistic regression and ensemble techniques. Comput. Methods Programs Biomed. Updat. 1, 100032 (2021)
Saxena, S., Mohapatra, D., Padhee, S., Sahoo, G.K.: Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms. Evol. Intell. (2021). https://doi.org/10.1007/s12065-021-00685-9
Ishwarya, M.S., Cherukuri, A.K.: Quantum-inspired ensemble approach to multi-attributed and multi-agent decision-making. Appl. Soft Comput. 106, 107283 (2021)
Singh, N., Singh, P.: Stacking-based multi-objective evolutionary ensemble framework for prediction of diabetes mellitus. Biocybern. Biomed. Eng. 40(1), 1–22 (2020)
Bania, R.K., Halder, A.: R-Ensembler: A greedy rough set based ensemble attribute selection algorithm with k-NN imputation for classification of medical data. Comput. Methods Programs Biomed. 184, 105122 (2020). https://doi.org/10.1016/j.cmpb.2019.105122
Vijayan, V.V., Anjali, C.: Prediction and diagnosis of diabetes mellitus—a machine learning approach. In: 2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS), pp. 122–127 (2015)
Ali, R., Siddiqi, M.H., Idris, M., Kang, B.H., Lee, S.: Prediction of diabetes mellitus based on boosting ensemble modeling. In: International conference on ubiquitous computing and ambient intelligence, pp. 25–28 (2014)
Wang, Q., Cao, W., Guo, J., Ren, J., Cheng, Y., Davis, D.N.: DMP_MI: an effective diabetes mellitus classification algorithm on imbalanced data with missing values. IEEE Access 7, 102232–102238 (2019)
Srivastava, T., Srivastava, T.: Introduction to k-NN, k-nearest neighbors: Simplified. Anal. Vidhya (2014)
Zhang, Z.: Introduction to machine learning: k-nearest neighbors. Ann. Transl. Med. 4(11) (2016)
Song, W., et al.: Design of a flexible wearable smart sEMG recorder integrated gradient boosting decision tree based hand gesture recognition. IEEE Trans. Biomed. Circuits Syst. 13(6), 1563–1574 (2019)
Zhang, Z., Jung, C.: GBDT-MO: Gradient-Boosted Decision Trees for Multiple Outputs. IEEE Trans. Neural Netw. Learn. Syst. 32(7), 3156–67 (2020)
Chen, C., Zhang, Q., Ma, Q., Yu, B.: LightGBM-PPI: predicting protein-protein interactions through LightGBM with multi-information fusion. Chemom. Intell. Lab. Syst. 191, 54–64 (2019)
Ke, G., et al.: Lightgbm: a highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 30, 3146–3154 (2017)
Hertzmann, A., Fleet, D.J., Brubaker, M.: AdaBoost. Univ, Toronto (2015)
Rahim, N.A., Paulraj, M., Adom, A.H.: Adaptive boosting with SVM classifier for moving vehicle classification. Procedia Eng. 53, 411–419 (2013)
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms, 2nd edn. Wiley, Hoboken, NJ, USA (2014)
Raschka, S.: MLxtend: providing machine learning and data science utilities and extensions to Python’s scientific computing stack. J. Open Source Softw. 3(24), 638 (2018). https://doi.org/10.21105/joss.00638
Raschka, S: Python machine learning. Packt publishing ltd (2015)
Kaggle: https://www.kaggle.com/uciml/pima-indians-diabetes-database, 2016. https://www.kaggle.com/uciml/pima-indians-diabetes-database (2021). Accessed 9 Sep 2021
Althnian, A., et al.: Impact of dataset size on classification performance: an empirical evaluation in the medical domain. Appl. Sci. 11(2), 796 (2021). https://doi.org/10.3390/app11020796
Kumar, K.: Indian Diabetes Analysis -LIME-Shapley, kaggle.com, 2022. https://www.kaggle.com/code/jagannathrk/indian-diabetes-analysis-lime-shapley
Thabtah, F., Hammoud, S., Kamalov, F., Gonsalves, A.: Data imbalance in classification: experimental evaluation. Inf. Sci. (NY) 513, 429–441 (2020). https://doi.org/10.1016/j.ins.2019.11.004
Leevy, J.L., Khoshgoftaar, T.M., Bauder, R.A., Seliya, N.: A survey on addressing high-class imbalance in big data. J. Big Data 5(1), 42 (2018). https://doi.org/10.1186/s40537-018-0151-6
Bader-El-Den, M., Teitei, E., Perry, T.: Biased random forest for dealing with the class imbalance problem. IEEE Trans. Neural Netw. Learn. Syst. 30(7), 2163–2172 (2019). https://doi.org/10.1109/TNNLS.2018.2878400