Diabetes type 2 classification using machine learning algorithms with up-sampling technique
Tóm tắt
Recently, the rate of chronic diabetes disease has increased extensively. Diabetes increases blood sugar and other problems like blurred vision, kidney failure, nerve problems, and stroke. Researchers for predicting diabetes have constructed various models. In this paper, gradient boosting classifier, AdaBoost classifier, decision tree classifier, and extra trees classifier machine learning models have been utilized for identifying chronic diabetes disease. The models analyze the PIMA Indian Diabetes dataset (PIMA) and Behavioral Risk Factor Surveillance System (BRFSS) diabetes datasets to classify patients with positive or negative diagnoses. 80% of the datasets are used as training data and 20% as testing data. The extra trees classifier with an area under curve of 0.96% for PIMA and 0.99% for BRFSS datasets outperformed other models. Therefore, it is suggested that healthcare providers can use the ETC model to predict chronic disease.
Tài liệu tham khảo
Centers for Disease Control and Prevention, “What is diabetes? | CDC.” https://www.cdc.gov/diabetes/basics/diabetes.html (accessed Aug. 28, 2022)
Mayo Clinic Staff (2022) Diabetes - Symptoms and causes - Mayo Clinic. https://www.mayoclinic.org/diseases-conditions/diabetes/symptoms-causes/syc-20371444 (accessed Aug. 28, 2022)
World Health Organization (2022) Diabetes.” https://www.who.int/news-room/fact-sheets/detail/diabetes (accessed Aug. 28, 2022).
Naz H, Ahuja S (2020) Deep learning approach for diabetes prediction using PIMA Indian dataset. J Diabetes Metab Disord 19(1):391–403. https://doi.org/10.1007/S40200-020-00520-5
Lu H, Uddin S, Hajati F, Moni MA, Khushi M (2022) A patient network-based machine learning model for disease prediction: the case of type 2 diabetes mellitus. Appl Intell. https://doi.org/10.1007/s10489-021-02533-w
Mujumdar A, Vaidehi V (2019) Diabetes prediction using machine learning algorithms. Proc Comput Sci. https://doi.org/10.1016/j.procs.2020.01.047
Sahoo AK, Pradhan C, and Das H (2020) Performance evaluation of different machine learning methods and deep-learning based convolutional neural network for health decision making. In: Studies in Computational Intelligence, vol. SCI 871, https://doi.org/10.1007/978-3-030-33820-6_8
Kopitar L, Kocbek P, Cilar L, Sheikh A, Stiglic G (2020) Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Sci Rep. https://doi.org/10.1038/s41598-020-68771-z
el Massari H, Mhammedi S, Sabouri Z, and Gherabi N (2022) Ontology-based machine learning to predict diabetes patients. In: Lecture notes in networks and systems, vol. 357 LNNS. https://doi.org/10.1007/978-3-030-91738-8_40
Farajollahi B, Mehmannavaz M, Mehrjoo H, Moghbeli F, Sayadi MJ (2021) Diabetes diagnosis using machine learning. Front Health Inform. https://doi.org/10.30699/fhi.v10i1.267
Ahmed U et al (2022) Prediction of diabetes empowered with fused machine learning. IEEE Access. https://doi.org/10.1109/ACCESS.2022.3142097
Sivaranjani S, Ananya S, Aravinth J, and Karthika R (2021) Diabetes prediction using machine learning algorithms with feature selection and dimensionality reduction. In: 2021 7th international conference on advanced computing and communication systems, ICACCS 2021. https://doi.org/10.1109/ICACCS51430.2021.9441935
Diabetes Health Indicators Dataset | Kaggle. https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset?resource=download (accessed Nov. 26, 2022)
Nadeem MW, Goh HG, Ponnusamy V, Andonovic I, Khan MA, Hussain M (2021) A fusion-based machine learning approach for the prediction of the onset of diabetes. Healthcare 9(10):1393. https://doi.org/10.3390/HEALTHCARE9101393
Maniruzzaman M, Rahman MJ, Ahammed B, Abedin MM (2020) Classification and prediction of diabetes disease using machine learning paradigm. Health Inf Sci Syst 8(1):1–14. https://doi.org/10.1007/S13755-019-0095-Z/TABLES/13
Dinh A, Miertschin S, Young A, Mohanty SD (2019) A data-driven approach to predicting diabetes and cardiovascular disease with machine learning. BMC Med Inform Decis Mak 19(1):211. https://doi.org/10.1186/s12911-019-0918-5
National Institute of Diabetes and Digestive and Kidney Diseases (2022) Pima Indians Diabetes - dataset by uci | data.world. https://data.world/uci/pima-indians-diabetes (accessed Aug. 28, 2022)
Brownlee J (2020) Imbalanced Classification with Python: Better Metrics, Balance Skewed Classes, Cost- Sensitive Learning. Machine Learning Mastery. https://books.google.pt/books?id=jaXJDwAAQBAJ
Jiang H (2021) Machine learning fundamentals : a concise introduction. https://books.google.iq/books?id=RzVfzgEACAAJ
Géron A (2019) Hands-on machine learning with Scikit-Learn, Keras and TensorFlow: concepts, tools, and techniques to build intelligent systems. https://books.google.iq/books?id=HHetDwAAQBAJ
Rafatirad S, Homayoun H, Chen Z, and Pudukotai Dinakarrao SM (2022) Machine learning for computer scientists and data analysts. https://doi.org/10.1007/978-3-030-96756-7
Brownlee J (2017) Machine learning mastery with python: understand your data, create accurate models and work projects end-to-end, Machine Learning Mastery, vol. 91
Albon C (2018) Machine learning with Python cookbook : practical solutions from preprocessing to deep learning. https://books.google.iq/books?id=VucltAEACAAJ
Scikit-learn (2022) sklearn.ensemble.ExtraTreesClassifier — scikit-learn 1.1.2 documentation. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html (accessed Aug. 29, 2022)