Construction of Xinjiang metabolic syndrome risk prediction model based on interpretable models
Tóm tắt
We aimed to construct simple and practical metabolic syndrome (MetS) risk prediction models based on the data of inhabitants of Urumqi and to provide a methodological reference for the prevention and control of MetS. This is a cross-sectional study conducted in the Xinjiang Uygur Autonomous Region of China. We collected data from inhabitants of Urumqi from 2018 to 2019, including demographic characteristics, anthropometric indicators, living habits and family history. Resampling technology was used to preprocess the data imbalance problems, and then MetS risk prediction models were constructed based on logistic regression (LR) and decision tree (DT). In addition, nomograms and tree diagrams of DT were used to explain and visualize the model. Of the 25,542 participants included in the study, 3,267 (12.8%) were diagnosed with MetS, and 22,275 (87.2%) were diagnosed with non-MetS. Both the LR and DT models based on the random undersampling dataset had good AUROC values (0.846 and 0.913, respectively). The accuracy, sensitivity, specificity, and AUROC values of the DT model were higher than those of the LR model. Based on a random undersampling dataset, the LR model showed that exercises such as walking (OR=0.769) and running (OR= 0.736) were protective factors against MetS. Age 60 ~ 74 years (OR=1.388), previous diabetes (OR=8.902), previous hypertension (OR=2.830), fatty liver (OR=3.306), smoking (OR=1.541), high systolic blood pressure (OR=1.044), and high diastolic blood pressure (OR=1.072) were risk factors for MetS; the DT model had 7 depth layers and 18 leaves, with BMI as the root node of the DT being the most important factor affecting MetS, and the other variables in descending order of importance: SBP, previous diabetes, previous hypertension, DBP, fatty liver, smoking, and exercise. Both DT and LR MetS risk prediction models have good prediction performance and their respective characteristics. Combining these two methods to construct an interpretable risk prediction model of MetS can provide methodological references for the prevention and control of MetS.
Tài liệu tham khảo
Tang Y, Zhao T, Huang N, Lin W, Luo Z, Ling C. Identification of Traditional Chinese Medicine Constitutions and Physiological Indexes Risk Factors in Metabolic Syndrome: A Data Mining Approach. Evidence-based complementary and alternative medicine. 2019;2019:1–10.
Federation I D. International Diabetes Federation (IDF) (2017) IDF Diabetes Atlas[EB/OL]. http://www.diabetesatlas.org/resources/2017-atlas.html.
Li R, Li W, Lun Z, Zhang H, Sun Z, Kanu J, et al. Prevalence of metabolic syndrome in Mainland China: a meta-analysis of published studies. BMC public health. 2016; 16:296.
Li R, Zhang L, Luo H, Lei Y, Zeng L, Zhu J, et al. Subclinical hypothyroidism and anxiety may contribute to metabolic syndrome in Sichuan of China: a hospital-based population study. Scientific reports. 2020; 10(1):2261.
Wu L, Shen Y, Hu L, Zhang M, Lai X. Prevalence and associated factors of metabolic syndrome in adults: a population-based epidemiological survey in Jiangxi province, China. BMC public health. 2020; 20(1):133.
Qin X, Qiu L, Tang G, Tsoi M, Xu T, Zhang L, et al. Prevalence of metabolic syndrome among ethnic groups in China. BMC public health. 2020; 20(1):297.
Liu L, Liu Y, Sun X, Yin Z, Li H, Deng K, et al. Identification of an obesity index for predicting metabolic syndrome by gender: the rural Chinese cohort study. BMC endocrine disorders. 2018; 18(1):54.
Ibrahim M, Pang D, Randhawa G, Pappas Y. Risk models and scores for metabolic syndrome: systematic review protocol. BMJ open. 2019; 9(9):e027326.
Li Y, Zhao L, Yu D, Wang Z, Ding G. Metabolic syndrome prevalence and its risk factors among adults in China: A nationally representative cross-sectional study. PloS one. 2018; 13(6):e0199293.
Kong S, Cho Y. Identification of female-specific genetic variants for metabolic syndrome and its component traits to improve the prediction of metabolic syndrome in females. BMC medical genetics. 2019; 20(1):99.
Abd El-Wahab E, Shatat H, Charl F. Adapting a Prediction Rule for Metabolic Syndrome Risk Assessment Suitable for Developing Countries. Journal of primary care & community health. 2019; 10:2150132719882760.
Talaei-Khoei A, Wilson J. Identifying people at risk of developing type 2 diabetes: A comparison of predictive analytics techniques and predictor variables. International journal of medical informatics. 2018; 119:22–38.
O’Neill A, Yang D, Roy M, Sebastiampillai S, Hofer S, Xu W. Development and Evaluation of a Machine Learning Prediction Model for Flap Failure in Microvascular Breast Reconstruction. Annals of surgical oncology. 2020; 27(9):3466–3475.
Geldof T, Van Damme N, Huys I, Van Dyck W. Patient-Level Effectiveness Prediction Modeling for Glioblastoma Using Classification Trees. Frontiers in pharmacology. 2019; 10:1665.
Elshawi R, Al-Mallah M, Sakr S. On the interpretability of machine learning-based model for predicting hypertension. BMC Med Inform Decis Mak. 2019; 19(1):146.
Lu YH, Lu JM, Wang SY, Li CL, Pan CY. Comparison of the diagnostic criteria of metabolic syndrome by International Diabetes Federation and that by Chinese Medical Association Diabetes Branch. Zhonghua yi xue za zhi. 2006; 86(6):386–389.
Schomaker M, Heumann C. Bootstrap inference when using multiple imputation. Statistics in medicine. 2018; 37(14):2252–2266.
Boussat B, François O, Viotti J, Seigneurin A, Giai J, François P, et al. Managing Missing Data in the Hospital Survey on Patient Safety Culture: A Simulation Study. J Patient Safety. 2021;17(2):e98–106.
Chen J, Lalor J, Liu W, Druhl E, Granillo E, Vimalananda V, et al. Detecting Hypoglycemia Incidents Reported in Patients’ Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance. J Med Internet Res. 2019; 21(3):e11990.
Xie C, Du R, Ho JW, Pang HH, Chiu KW, Lee EY, et al. Effect of machine learning re-sampling techniques for imbalanced datasets in 18F-FDG PET-based radiomics model on prognostication performance in cohorts of head and neck cancer patients. Eur J Nuclear Med Mol Imaging. 2020;47(12):2826–35.
Feng X, Yang L, Tan L, Li Y. Risk factor analysis of device-related infections: value of re-sampling method on the real-world imbalanced dataset. BMC Med Inform Decis Mak. 2019; 19(1):185.
Fotouhi S, Asadi S, Kattan M. A comprehensive data level analysis for cancer diagnosis on imbalanced data. J Biomed Informatics. 2019;90:103089.
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. 2011.
Wang Y, Du Z, Lawrence W, Huang Y, Deng Y, Hao Y. Predicting Hepatitis B Virus Infection Based on Health Examination Data of Community Population. Int J Environ Res Public Health. 2019;16(23):4842.
Zhang J, Li X, Huang R, Feng W, Kong Y, Xu F, et al: A nomogram to predict the probability of axillary lymph node metastasis in female patients with breast cancer in China: A nationwide, multicenter, 10-year epidemiological study. Oncotarget. 2017; 8(21):35311–35325.
Tayefi M, Esmaeili H, Saberi Karimian M, Amirabadi Zadeh A, Ebrahimi M, Safarian M, et al. The application of a decision tree to establish the parameters associated with hypertension. Computer Methods Programs Biomed. 2017;139:83–91.
Mayo M, Chepulis L, Paul R. Glycemic-aware metrics and oversampling techniques for predicting blood glucose levels using machine learning. PloS one. 2019; 14(12):e0225613.
Li B, Ding S, Song G, Li J, Zhang Q. Computer-Aided Diagnosis and Clinical Trials of Cardiovascular Diseases Based on Artificial Intelligence Technologies for Risk-Early Warning Model. J Med Systems. 2019;43(7):228.
Wu Y, Fang Y. Stroke Prediction with Machine Learning Methods among Older Chinese. Int J Environ Res Public Health. 2020;17(6):1828.
Lanera C, Berchialla P, Sharma A, Minto C, Gregori D, Baldi I. Screening PubMed abstracts: is class imbalance always a challenge to machine learning? Systematic reviews. 2019; 8(1):317.
Blagus R, Lusa L. Joint use of over- and under-sampling techniques and cross-validation for the development and assessment of prediction models. BMC Bioinformatics. 2015;16:363.
Van Belle V, Van Calster B. Visualizing Risk Prediction Models. PloS one. 2015; 10(7):e0132614.
Lipkus IM, Hollands JG. The Visual Communication of Risk. Journal of the National Cancer Institute Monographs. 1999; 25(25):149.
Jeong H. The Relationship between Workplace Environment and Metabolic Syndrome. Int J Occupational Environ Med. 2018;9(4):176–83.
Wang S, Wang S, Jiang S, Ye Q. An anthropometry-based nomogram for predicting metabolic syndrome in the working population. Eur J Cardiovasc Nurs. 2020;19(3):223–9.
Sankari E, Manimegalai D. Predicting membrane protein types by incorporating a novel feature set into Chou’s general PseAAC. J Theoretical Biol. 2018;455:319–28.
Deng X, Yu T, Hu A. Predicting the Risk for Hospital-Acquired Pressure Ulcers in Critical Care Patients. Critical Care Nurse. 2017;37(4):e1–11.
Speiser J, Callahan K, Houston D, Fanning J, Gill T, Guralnik J, et al. Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults. J Gerontol Series. A Biol Sci Med Sci. 2021;76(4):647–54.
Madakkatel I, Zhou A, McDonnell M, Hyppönen E. Combining machine learning and conventional statistical approaches for risk factor discovery in a large cohort study. Scientific Reports. 2021;11(1):22997.
Ghazalbash S, Zargoush M, Mowbray F, Papaioannou A. Examining the predictability and prognostication of multimorbidity among older Delayed-Discharge Patients: A Machine learning analytics. Int J Med Informatics. 2021;156:104597.
McManus E, Sach T, Levell N. An introduction to the methods of decision-analytic modelling used in economic evaluations for Dermatologists. J Eur Acad Dermatol Venereol. 2019;33(10):1829–36.
Yu HW, Hussain M, Afzal M, Ali T, Choi JY, Han HS, et al. Use of mind maps and iterative decision trees to develop a guideline-based clinical decision support system for routine surgical practice: case study in thyroid nodules. J Am Med Inform Assoc. 2019;26(6):524–36.
Chen SS, Zaborek NA, Doubleday AR, Schaefer SC, Long KL, Pitt SC, et al. Optimizing Levothyroxine Dose Adjustment After Thyroidectomy With a Decision Tree. J Surg Res. 2019;244:102–6.
Guo HP, Dong YD, Wu CA, Fan M. Logistic regression method for class imbalance problem. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence. 2015; 28(8):686–693.