Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project
Tóm tắt
Từ khóa
Tài liệu tham khảo
International Diabetes Federation, <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://www.diabetesatlas.org" xlink:type="simple">http://www.diabetesatlas.org</ext-link>.;.
L Rydén, 2007, Guidelines on diabetes, pre-diabetes, and cardiovascular diseases: full text, European Heart Journal Supplements, 9, C3, 10.1093/eurheartj/ehl261
SP Juraschek, 2015, Cardiorespiratory fitness and incident diabetes: the FIT (Henry Ford ExercIse Testing) project, Diabetes Care, 38, 1075, 10.2337/dc14-2714
S Habibi, 2015, Type 2 Diabetes Mellitus Screening and Risk Factors Using Decision Tree: Results of Data Mining, Global journal of health science, 7, 304, 10.5539/gjhs.v7n5p304
M Zhu, 2015, Mortality rates and the causes of death related to diabetes mellitus in Shanghai Songjiang District: an 11-year retrospective analysis of death certificates, BMC endocrine disorders, 15, 45, 10.1186/s12902-015-0042-1
S Leahy, 2015, Prevalence and correlates of diagnosed and undiagnosed type 2 diabetes mellitus and pre-diabetes in older adults: Findings from the Irish Longitudinal Study on Ageing (TILDA), Diabetes research and clinical practice, 110, 241, 10.1016/j.diabres.2015.10.015
L Alhyas, 2012, Prevalence of type 2 diabetes in the States of the co-operation council for the Arab States of the Gulf: a systematic review, PloS one, 7, e40948, 10.1371/journal.pone.0040948
PT Williams, 2008, Vigorous exercise, fitness and incident hypertension, high cholesterol, and diabetes, Medicine and science in sports and exercise, 40, 998, 10.1249/MSS.0b013e31816722a9
S Wild, 2004, Global prevalence of diabetes estimates for the year 2000 and projections for 2030, Diabetes care, 27, 1047, 10.2337/diacare.27.5.1047
D Statistics, 1999, National Institute of Diabetes and Digestive and Kidney Diseases, 99
I Kononenko, 2001, Machine learning for medical diagnosis: history, state of the art and perspective, Artificial Intelligence in medicine, 23, 89, 10.1016/S0933-3657(01)00077-X
MH Al-Mallah, 2014, Rationale and design of the Henry Ford Exercise Testing Project (the FIT project), Clinical cardiology, 37, 456, 10.1002/clc.22302
AL Blum, 1997, Selection of relevant features and examples in machine learning, Artificial intelligence, 97, 245, 10.1016/S0004-3702(97)00063-5
I Guyon, 2003, An introduction to variable and feature selection, Journal of machine learning research, 3, 1157
JT Kent, 1983, Information gain and a general measure of correlation, Biometrika, 70, 163, 10.1093/biomet/70.1.163
SB Kotsiantis, 2007, Supervised machine learning: A review of classification techniques
XH Meng, 2013, Comparison of three data mining models for predicting diabetes or prediabetes by risk factors, The Kaohsiung journal of medical sciences, 29, 93, 10.1016/j.kjms.2012.08.016
SE Stern, 2005, Identification of individuals with insulin resistance using routine clinical measurements, Diabetes, 54, 333, 10.2337/diabetes.54.2.333
JL Breault, 2002, Data mining a diabetic data warehouse, Artificial intelligence in medicine, 26, 37, 10.1016/S0933-3657(02)00051-9
JR Quinlan, 2014, C4. 5: programs for machine learning
R Kohavi, 1996, KDD, vol. 96, 202
S Le Cessie, 1992, Ridge estimators in logistic regression, Applied statistics, 191, 10.2307/2347628
John GH, Langley P. Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc.; 1995. p. 338–345.
Sumner M, Frank E, Hall M. Speeding up logistic model tree induction. In: European Conference on Principles of Data Mining and Knowledge Discovery. Springer; 2005. p. 675–683.
A Liaw, 2002, Classification and regression by randomForest, R news, 2, 18
GE Batista, 2004, A study of the behavior of several methods for balancing machine learning training data, ACM Sigkdd Explorations Newsletter, 6, 20, 10.1145/1007730.1007735
G Menardi, 2014, Training and assessing classification rules with imbalanced data, Data Mining and Knowledge Discovery, 28, 92, 10.1007/s10618-012-0295-5
V Ganganwar, 2012, An overview of classification algorithms for imbalanced datasets, International Journal of Emerging Technology and Advanced Engineering, 2, 42
H He, 2009, Learning from imbalanced data, IEEE Transactions on knowledge and data engineering, 21, 1263, 10.1109/TKDE.2008.239
Poolsawad N, Kambhampati C, Cleland J. Balancing class for performance of classification with a clinical dataset. In: Proceedings of the World Congress on Engineering. vol. 1; 2014.
Wang J, Xu M, Wang H, Zhang J. Classification of imbalanced data by using the SMOTE algorithm and locally linear embedding. In: 2006 8th international Conference on Signal Processing. vol. 3. IEEE; 2006.
García V, Alejo R, Sánchez JS, Sotoca JM, Mollineda RA. Combined effects of class imbalance and class overlap on instance-based classification. In: International Conference on Intelligent Data Engineering and Automated Learning. Springer; 2006. p. 371–378.
CR Jack, 2008, The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods, Journal of Magnetic Resonance Imaging, 27, 685, 10.1002/jmri.21049
L Lusa, 2015, Joint use of over-and under-sampling techniques and cross-validation for the development and assessment of prediction models, BMC bioinformatics, 16, 1
NV Chawla, 2005, Data mining and knowledge discovery handbook, 853
P Refaeilzadeh, 2009, Encyclopedia of database systems, 532
JH Kim, 2009, Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap, Computational Statistics & Data Analysis, 53, 3735, 10.1016/j.csda.2009.04.009
R Kohavi, 1995, IJCAI, vol. 14, 1137
Y Bengio, 2004, No unbiased estimator of the variance of k-fold cross-validation, Journal of Machine Learning Research, 5, 1089
B Liu, 2015, Identification of real microRNA precursors with a pseudo structure status composition approach, PloS one, 10, e0121501, 10.1371/journal.pone.0121501
B Liu, 2016, iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach, Journal of Biomolecular Structure and Dynamics, 34, 223, 10.1080/07391102.2015.1014422
Y Zhang, 2014, Abstract and Applied Analysis, vol. 2014
B Liu, 2016, Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning, IEEE transactions on nanobioscience, 15, 328, 10.1109/TNB.2016.2555951
B Liu, 2016, iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework, Bioinformatics, 32, 2411, 10.1093/bioinformatics/btw186
B Liu, 2017, iRSpot-EL: identify recombination spots with an ensemble learning approach, Bioinformatics, 33, 35, 10.1093/bioinformatics/btw539
L Song, 2014, nDNA-prot: identification of DNA-binding proteins based on unbalanced classification, BMC bioinformatics, 15, 298, 10.1186/1471-2105-15-298
C Wang, 2015, imDC: an ensemble learning method for imbalanced classification with miRNA data, Genetics and Molecular Research, 14, 123, 10.4238/2015.January.15.15
G Seni, 2010, Ensemble methods in data mining: improving accuracy through combining predictions, Synthesis Lectures on Data Mining and Knowledge Discovery, 2, 1, 10.2200/S00240ED1V01Y200912DMK002
B Farran, 2013, Predictive models to assess risk of type 2 diabetes, hypertension and comorbidity: machine-learning algorithms and validation using national health data from Kuwait—a cohort study, BMJ open, 3, e002457, 10.1136/bmjopen-2012-002457