Machine Learning and Data Mining Methods in Diabetes Research

Computational and Structural Biotechnology Journal - Tập 15 - Trang 104-116 - 2017
Ioannis Kavakiotis1,2, Olga Tsave3, Athanasios Salifoglou3, Nicos Maglaveras2,4, Ioannis Vlahavas1, Ioanna Chouvarda2,4
1Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
2Institute of Applied Biosciences, CERTH, Thessaloniki, Greece
3Laboratory of Inorganic Chemistry, Department of Chemical Engineering, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
4Lab of Computing and Medical Informatics, Medical School, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece

Tài liệu tham khảo

Marx, 2013, Biology: the big challenges of big data, Nature, 498, 255, 10.1038/498255a Mattmann, 2013, Computing: a vision for data science, Nature, 493, 473, 10.1038/493473a Wilson, 1999 Mitchell, 1997, 2 Fayyad, 1996, From data mining to knowledge discovery in databases, AI Mag, 17, 37 Russell, Stuart; Norvig, Peter (2003) [1995]. Artificial Intelligence: A Modern Approach (2nd Ed.). Prentice Hall. ISBN 978-0137903955. Agrawal, 1993, Mining association rules between sets of items in large databases, 207 Agrawal, 1994, Fast algorithms for mining association rules in large databases, 478 Kavakiotis, 2014, Mining frequent patterns and association rules from biological data Han, 2011, Data mining: concepts and techniques Alpaydin, 2004 Guyon, 2003, An introduction to variable and feature selection, J Mach Learn Res, 3, 1157 Witten, 2011 American Diabetes Association, 2009, Diagnosis and classification of diabetes mellitus, Diabetes Care, 32, S62, 10.2337/dc09-S062 Cox, 2009, Test for screening and diagnosis of type 2 diabetes, Clin Diabetes, 4, 132, 10.2337/diaclin.27.4.132 Krentz, 2005, Oral antidiabetic agents: current role in type 2 diabetes mellitus, Drugs, 65, 385, 10.2165/00003495-200565030-00005 Tsave, 2015, Structure-specific adipogenic capacity of novel, well-defined ternary Zn(II)-Schiff base materials. Biomolecular correlations in zinc-induced differentiation of 3T3-L1 pre-adipocytes to adipocytes, J Inorg Biochem, 152, 123, 10.1016/j.jinorgbio.2015.08.014 Halevas, 2015, Design, synthesis and characterization of novel binary V(V)-Schiff base materials linked with insulin-mimetic vanadium-induced differentiation of 3T3-L1 fibroblasts to adipocytes. Structure–function correlations at the molecular level, J Inorg Biochem, 147, 99, 10.1016/j.jinorgbio.2015.03.009 Tsave, 2016, The adipogenic potential of Cr(III). A molecular approach exemplifying metal-induced enhancement of insulin mimesis in diabetes mellitus II, J Inorg Biochem, 163, 323, 10.1016/j.jinorgbio.2016.07.015 Sakurai, 2002, Antidiabetic vanadium(IV) and zinc(II) complexes review article coordination, Chem Rev, 226, 187 “Records in DBLP”. Statistics. DBLP. Retrieved 2016–07-16; 2016. Després, 2006, Abdominal obesity and metabolic syndrome, Nature, 444, 881, 10.1038/nature05488 Caveney, 2011, Diabetes and biomarkers, J Diabetes Sci Technol, 5, 192, 10.1177/193229681100500127 Jelinek, 2016, Data analytics identify glycated haemoglobin co-markers for type 2 diabetes mellitus diagnosis, Comput Biol Med, 75, 90, 10.1016/j.compbiomed.2016.05.005 Bagherzadeh-Khiabani, 2016, A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results, J Clin Epidemiol, 71, 76, 10.1016/j.jclinepi.2015.10.002 Wang, 2015, An improved electromagnetism-like mechanism algorithm and its application to the prediction of diabetes mellitus, J Biomed Inform, 54, 220, 10.1016/j.jbi.2015.02.001 Cai, 2015, Type 2 diabetes biomarkers of human gut microbiota selected via iterative sure independent screening method, PLoS One, 10, 10.1371/journal.pone.0140827 Georga, 2015, Evaluation of short-term predictors of glucose concentration in type 1 diabetes combining feature ranking with regression models, Med Biol Eng Comput, 53, 1305, 10.1007/s11517-015-1263-1 Lee, 2016, Identification of type 2 diabetes risk factors using phenotypes consisting of anthropometry and triglycerides based on machine learning, IEEE J Biomed Health Inform, 20, 39, 10.1109/JBHI.2015.2396520 Marling, 2013, A consensus perceived glycemic variability metric, J Diabetes Sci Technol, 7, 871, 10.1177/193229681300700409 Huang, 2013, Exploring the relationship between 5′AMP-activated protein kinase and markers related to type 2 diabetes mellitus, Talanta, 110, 1, 10.1016/j.talanta.2013.03.039 Worachartcheewan, 2013, Quantitative population–health relationship (QPHR) for assessing metabolic syndrome, EXCLI J, 12, 569 Aslam, 2013, Feature generation using genetic programming with comparative partner selection for diabetes classification, Expert Syst Appl, 40, 5402, 10.1016/j.eswa.2013.04.003 Sideris, 2016, A flexible data-driven comorbidity feature extraction framework, Comput Biol Med, 73, 165, 10.1016/j.compbiomed.2016.04.014 Breiman, 2001, Random forests, Mach Learn, 45, 5, 10.1023/A:1010933404324 Robnik-Sikonja, 2003, Theoretical and empirical analysis of ReliefF and RReliefF, Mach Learn, 53, 23, 10.1023/A:1025667309714 Cover, 1967, Nearest neighbor pattern classification, IEEE Trans Inf Theory, IT-13, 21, 10.1109/TIT.1967.1053964 Chen, 2012, An improved particle swarm optimization for feature selection, Intell Data Anal, 16, 167, 10.3233/IDA-2012-0517 Fan, 2008, Sure independence screening for ultrahigh dimensional feature space, J R Stat Soc Series B Stat Methodology, 70, 849, 10.1111/j.1467-9868.2008.00674.x Oh, 2016, Type 2 diabetes mellitus trajectories and associated risks, Big Data, 4, 25, 10.1089/big.2015.0029 Worachartcheewan, 2013, Machine learning approaches for discerning intercorrelation of hematological parameters and glucose level for identification of diabetes mellitus, EXCLI J, 12, 885 Worachartcheewan, 2015, Predicting metabolic syndrome using the random forest method, ScientificWorldJournal, 2015, 581501, 10.1155/2015/581501 Habibi, 2015, Type 2 diabetes mellitus screening and risk factors using decision tree: results of data mining, Glob J Health Sci, 7, 304, 10.5539/gjhs.v7n5p304 Razavian, 2015, Population-level prediction of type 2 diabetes from claims data and analysis of risk factors, Big Data, 3, 277, 10.1089/big.2015.0020 Meng, 2013, Comparison of three data mining models for predicting diabetes or prediabetes by risk factors, Kaohsiung J Med Sci, 29, 93, 10.1016/j.kjms.2012.08.016 Malik, 2016, Non-invasive detection of fasting blood glucose level via electrochemical measurement of saliva, Springerplus, 5, 701, 10.1186/s40064-016-2339-6 Allalou, 2016, A predictive metabolic signature for the transition from gestational diabetes to type 2 diabetes, Diabetes, 10.2337/db15-1720 Agarwal, 2016, Learning statistical models of phenotypes using noisy labeled training data, J Am Med Inform Assoc, 10.1093/jamia/ocw028 Hoyt, 2016, Digital family history data mining with neural networks: a pilot study, Perspect Health Inf Manag, 13, 1c Anderson, 2015, Reverse engineering and evaluation of prediction models for progression to type 2 diabetes: an application of machine learning using electronic health records, J Diabetes Sci Technol, 10, 6, 10.1177/1932296815620200 Anderson, 2016, Electronic health record phenotyping improves detection and screening of type 2 diabetes in the general United States population: a cross-sectional, unselected, retrospective study, J Biomed Inform, 60, 162, 10.1016/j.jbi.2015.12.006 Bashir, 2016, IntelliHealth: a medical decision support application using a novel weighted multi-layer classifier ensemble framework, J Biomed Inform, 59, 185, 10.1016/j.jbi.2015.12.001 Ozcift, 2011, Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms, Comput Methods Programs Biomed, 104, 443, 10.1016/j.cmpb.2011.03.018 Ramezankhani, 2016, The impact of oversampling with SMOTE on the performance of 3 classifiers in prediction of type 2 diabetes, Med Decis Making, 36, 137, 10.1177/0272989X14560647 Choi, 2014, Screening for prediabetes using machine learning models, Comput Math Methods Med, 2014, 618976, 10.1155/2014/618976 Belciug, 2014, Error-correction learning for artificial neural networks using the Bayesian paradigm. Application to automated medical diagnosis, J Biomed Inform, 52, 329, 10.1016/j.jbi.2014.07.013 Lee, 2014, Prediction of fasting plasma glucose status using anthropometric measures for diagnosing type 2 diabetes, IEEE J Biomed Health Inform, 18, 555, 10.1109/JBHI.2013.2264509 Fong, 2013, Evaluation of stream mining classifiers for real-time clinical decision support system: a case study of blood glucose prediction in diabetes therapy, Biomed Res Int, 2013, 274193, 10.1155/2013/274193 Ozery-Flato, 2013, Predictive models for type 2 diabetes onset in middle-aged subjects with the metabolic syndrome, Diabetol Metab Syndr, 5, 36, 10.1186/1758-5996-5-36 Farran, 2013, Predictive models to assess risk of type 2 diabetes, hypertension and comorbidity: machine-learning algorithms and validation using national health data from Kuwait—a cohort study, BMJ Open, 3, 10.1136/bmjopen-2012-002457 Mani, 2012, Type 2 diabetes risk forecasting from EMR data using machine learning, AMIA Annu Symp Proc, 2012, 606 Shankaracharya, 2012, Computational intelligence-based diagnosis tool for the detection of prediabetes and type 2 diabetes in India, Rev Diabet Stud, 9, 55, 10.1900/RDS.2012.9.55 Chikh, 2012, Diagnosis of diabetes diseases using an Artificial Immune Recognition System2 (AIRS2) with fuzzy K-nearest neighbor, J Med Syst, 36, 2721, 10.1007/s10916-011-9748-4 Malley, 2012, Probability machines: consistent probability estimation using nonparametric learning machines, Methods Inf Med, 51, 74, 10.3414/ME00-01-0052 Ganji, 2011, A fuzzy classification system based on ant colony optimization for diabetes disease diagnosis, Expert Syst Appl, 38, 14650, 10.1016/j.eswa.2011.05.018 Çalisir, 2011, An automatic diabetes diagnosis system based on LDA-Wavelet Support Vector Machine Classifier, Expert Syst. Appl., 38, 8311, 10.1016/j.eswa.2011.01.017 Robertson, 2011, Blood glucose prediction using artificial neural networks trained with the AIDA diabetes simulator: a proof-of-concept pilot study, J Electr Comput Eng, 2011, 681786:1 Georga, 2013, Multivariate prediction of subcutaneous glucose concentration in type 1 diabetes patients based on support vector regression, IEEE J Biomed Health Inform, 17, 71, 10.1109/TITB.2012.2219876 Han, 2015, Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes, IEEE J Biomed Health Inform, 19, 728, 10.1109/JBHI.2014.2325615 Gregori, 2011, Using data mining techniques in monitoring diabetes care. The simpler the better?, J Med Syst, 35, 277, 10.1007/s10916-009-9363-9 Ramezankhani, 2015, An application of association rule mining to extract risk pattern for type 2 diabetes using Tehran lipid and glucose study database, Int J Endocrinol Metab, 13, 10.5812/ijem.25389 Simon, 2013, Survival association rule mining towards type 2 diabetes risk assessment, AMIA Annu Symp Proc, 2013, 1293 Simon, 2015, Extending association rule summarization techniques to assess risk of diabetes mellitus, IEEE Trans Knowl Data Eng, 27, 130, 10.1109/TKDE.2013.76 Batal, 2012, Mining recent temporal patterns for event detection in multivariate time series data, 280 Beloufa, 2013, Design of fuzzy classifier for diabetes disease using modified artificial bee colony algorithm, Comput Methods Programs Biomed, 112, 92, 10.1016/j.cmpb.2013.07.009 El-Sappagh, 2015, A fuzzy-ontology-oriented case-based reasoning framework for semantic diabetes diagnosis, Artif Intell Med, 65, 179, 10.1016/j.artmed.2015.08.003 Cade, 2008, Diabetes-related microvascular and macrovascular diseases in the physical therapy setting, Phys Ther, 88, 1322, 10.2522/ptj.20080008 Lagani, 2015, Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data, J Diabetes Complications, 29, 479, 10.1016/j.jdiacomp.2015.03.001 Lagani, 2015, Realization of a service for the long-term risk assessment of diabetes-related complications, J Diabetes Complications, 29, 691, 10.1016/j.jdiacomp.2015.03.011 Sacchi, 2015, Improving risk-stratification of diabetes complications using temporal data mining, Conf Proc IEEE Eng Med Biol Soc, 2015, 2131 Huang, 2015, An interpretable rule-based diagnostic classification of diabetic nephropathy among type 2 diabetes patients, BMC Bioinforma, 16, S5, 10.1186/1471-2105-16-S1-S5 Leung, 2013, Using a multi-staged strategy based on machine learning and mathematical modeling to predict genotype–phenotype risk patterns in diabetic kidney disease: a prospective case–control cohort analysis, BMC Nephrol, 14, 162, 10.1186/1471-2369-14-162 DuBrava, 2016, Using random forest models to identify correlates of a diabetic peripheral neuropathy diagnosis from electronic health record data, Pain Med Stranieri, 2013, An approach for Ewing test selection to support the clinical assessment of cardiac autonomic neuropathy, Artif Intell Med, 58, 185, 10.1016/j.artmed.2013.04.007 Abawajy, 2013, Predicting cardiac autonomic neuropathy category for diabetic data with missing values, Comput Biol Med, 43, 1328, 10.1016/j.compbiomed.2013.07.002 de la Monte, 2008, Alzheimer's disease is type 3 diabetes—evidence reviewed, J Diabetes Sci Technol, 2, 1101, 10.1177/193229680800200619 Narasimhan, 2014, Diabetes of the brain: computational approaches and interventional strategies, CNS Neurol Disord Drug Targets, 13, 408, 10.2174/18715273113126660156 Jin, 2015, Development of a clinical forecasting model to predict comorbid depression among diabetes patients and an application in depression screening policy making, Prev Chronic Dis, 12, 10.5888/pcd12.150047 Yusuf, 2015, In-vitro diagnosis of single and poly microbial species targeted for diabetic foot infection using e-nose technology, BMC Bioinforma, 16, 158, 10.1186/s12859-015-0601-5 Rau, 2016, Development of a web-based liver cancer prediction model for type II diabetes patients by using an artificial neural network, Comput Methods Programs Biomed, 125, 58, 10.1016/j.cmpb.2015.11.009 Patterson, 2003, Mortality from heart disease in a cohort of 23,000 patients with insulin-treated diabetes, Diabetologia, 46, 760, 10.1007/s00125-003-1116-6 Jonnagaddala, 2015, Identification and progression of heart disease risk factors in diabetic patients from longitudinal electronic health records, Biomed Res Int, 2015, 636371, 10.1155/2015/636371 Cryer, 2003, Hypoglycemia in diabetes, Diabetes Care, 10.2337/diacare.26.6.1902 Sudharsan, 2015, Hypoglycemia prediction using machine learning models for patients with type 2 diabetes, J Diabetes Sci Technol, 9, 86, 10.1177/1932296814554260 Georga, 2013, A glucose model based on support vector regression for the prediction of hypoglycemic events under free-living conditions, Diabetes Technol Ther, 15, 634, 10.1089/dia.2012.0285 Jensen, 2014, Evaluation of an algorithm for retrospective hypoglycemia detection using professional continuous glucose monitoring data, J Diabetes Sci Technol, 8, 117, 10.1177/1932296813511744 Pinhas-Hamiel, 2013, Detecting intentional insulin omission for weight loss in girls with type 1 diabetes mellitus, Int J Eat Disord, 46, 819, 10.1002/eat.22138 Tapp, 2003, The prevalence of and factors associated with diabetic retinopathy in the Australian population, Diabetes Care, 26, 1731, 10.2337/diacare.26.6.1731 Li, 2013, Automated analysis of diabetic retinopathy images: principles, recent developments, and emerging trends, Curr Diab Rep, 13, 453, 10.1007/s11892-013-0393-9 Torok, 2015, Combined methods for diabetic retinopathy screening, using retina photographs and tear fluid proteomics biomarkers, J Diabetes Res, 2015, 623619, 10.1155/2015/623619 Jin, 2016, Development of diagnostic biomarkers for detecting diabetic retinopathy at early stages using quantitative Proteomics.J, Diabetes Res, 2016, 6571976 Oh, 2013, Diabetic retinopathy risk prediction for fundus examination using sparse learning: a cross-sectional study, BMC Med Inform Decis Mak, 13, 106, 10.1186/1472-6947-13-106 Ibrahim, 2015, Classification of diabetes maculopathy images using data-adaptive neuro-fuzzy inference classifier, Med Biol Eng Comput, 53, 1345, 10.1007/s11517-015-1329-0 Roychowdhury, 2014, DREAM: diabetic retinopathy analysis using machine learning, IEEE J Biomed Health Inform, 18, 1717, 10.1109/JBHI.2013.2294635 Krishnamoorthy, 2015, A novel image recuperation approach for diagnosing and ranking retinopathy disease level using diabetic fundus image, PLoS One, 10, 10.1371/journal.pone.0125542 Pires, 2013, Assessing the need for referral in automatic diabetic retinopathy detection, IEEE Trans Biomed Eng, 60, 3391, 10.1109/TBME.2013.2278845 Giancardo, 2012, Exudate-based diabetic macular edema detection in fundus images using publicly available datasets, Med Image Anal, 16, 216, 10.1016/j.media.2011.07.004 Quellec, 2013, Multimedia data mining for automatic diabetic retinopathy screening, Conf Proc IEEE Eng Med Biol Soc, 2013, 7144 Prentasic, 2014, Weighted ensemble based automatic detection of exudates in fundus photographs, Conf Proc IEEE Eng Med Biol Soc, 2014, 138 Zhang, 2014, Detecting diabetes mellitus and nonproliferative diabetic retinopathy using tongue color, texture, and geometry features, IEEE Trans Biomed Eng, 61, 491, 10.1109/TBME.2013.2282625 Ogunyemi, 2015, Machine learning approaches for detecting diabetic retinopathy from clinical and public health records, AMIA Annu Symp Proc, 2015, 983 Torok, 2013, Tear fluid proteomics multimarkers for diabetic retinopathy screening, BMC Ophthalmol, 13, 40, 10.1186/1471-2415-13-40 Jelinek, 2006, An innovative multi-disciplinary diabetes complications screening programme in a rural community: a description and preliminary results of the screening, Aust J Prim Health, 12, 14, 10.1071/PY06003 Wright, 2015, The use of sequential pattern mining to predict next prescribed medications, J Biomed Inform, 53, 73, 10.1016/j.jbi.2014.09.003 Deja, 2015, Differential sequential patterns supporting insulin therapy of new-onset type 1 diabetes, Biomed Eng Online, 14, 13, 10.1186/s12938-015-0004-x Herrero, 2015, Advanced insulin bolus advisor based on run-to-run control and case-based reasoning, IEEE J Biomed Health Inform, 19, 1087 Adem Karahoca, 2012, Alper Tunga: dosage planning for type 2 diabetes mellitus patients using indexing HDMR, Expert Syst Appl, 39, 7207, 10.1016/j.eswa.2012.01.056 Namayanja, 2012, An assessment of patient behavior over time-periods: a case study of managing type 2 diabetes through blood glucose readings and insulin doses, J Med Syst, 36, S65, 10.1007/s10916-012-9894-3 Shoombuatong, 2015, Navigating the chemical space of dipeptidyl peptidase-4 inhibitors, Drug Des Devel Ther, 9, 4515 Patra, 2011, Artificial neural network-based drug design for diabetes mellitus using flavonoids, J Comput Chem, 32, 555, 10.1002/jcc.21641 Schrom, 2013, Quantifying the effect of statin use in pre-diabetic phenotypes discovered through association rule mining, AMIA Annu Symp Proc, 2013, 1249 Bujac, 2014, Patient characteristics are not associated with clinically important differential response to dapagliflozin: a staged analysis of phase 3 data, Diabetes Ther, 5, 471, 10.1007/s13300-014-0090-y Liu, 2013, An efficacy driven approach for medication recommendation in type 2 diabetes treatment using data mining techniques, Stud Health Technol Inform, 192, 1071 Lee, 2013, Predictors of remission of type 2 diabetes mellitus in obese patients after gastrointestinal surgery, Obes Res Clin Pract, 7, e494, 10.1016/j.orcp.2012.08.190 Lee, 2012, Predictors of diabetes remission after bariatric surgery in Asia, Asian J Surg, 35, 67, 10.1016/j.asjsur.2012.04.010 Zeevi, 2015, Personalized nutrition by prediction of glycemic responses, Cell, 163, 1079, 10.1016/j.cell.2015.11.001 Kaprio, 1992, Concordance for type 1 (insulin-dependent) and type 2 (non-insulin-dependent) diabetes mellitus in a population-based cohort of twins in Finland, Diabetologia, 35, 1060, 10.1007/BF02221682 Anjos, 2004, Mechanisms of genetic susceptibility to type 1 diabetes: beyond HLA, Mol Genet Metab, 81, 187, 10.1016/j.ymgme.2003.11.010 Zhao, 2016, An object-oriented regression for building disease predictive models with multiallelic HLA genes, Genet Epidemiol, 40, 315, 10.1002/gepi.21968 Nguyen, 2013, Definition of high-risk type 1 diabetes HLA-DR and HLA-DQ types using only three single nucleotide polymorphisms, Diabetes, 62, 2135, 10.2337/db12-1398 Park, 2011, A methodology for multivariate phenotype-based genome-wide association studies to mine pleiotropic genes, BMC Syst Biol, 5 Suppl. 2, S13, 10.1186/1752-0509-5-S2-S13 Lopes, 2014, Temporal profiling of cytokine-induced genes in pancreatic β-cells by meta-analysis and network inference, Genomics, 103, 264, 10.1016/j.ygeno.2013.12.007 Lee, 2011, Development of a predictive model for type 2 diabetes mellitus using genetic and clinical data, Osong Public Health Res Perspect, 2, 75, 10.1016/j.phrp.2011.07.005 Yarimizu, 2015, Tyrosine kinase ligand-receptor pair prediction by using support vector machine, Adv Bioinforma, 2015, 528097, 10.1155/2015/528097 Global burden of diabetes. International Diabetes federation Pakhomov, 2011, The role of the electronic medical record in the assessment of health related quality of life, AMIA Annu Symp Proc, 2011, 1080 Nimmagadda, 2014, On robust methodologies for managing public health care systems, Int J Environ Res Public Health, 11, 1106, 10.3390/ijerph110101106 Renard, 2011, An algorithm to identify patients with treated type 2 diabetes using medico-administrative data, BMC Med Inform Decis Mak, 11, 23, 10.1186/1472-6947-11-23 Bradley, 2013, Implications of big data analytics on population health management, Big Data, 1, 152, 10.1089/big.2013.0019 Lee, 2013, Results on mining NHANES data: a case study in evidence-based medicine, Comput Biol Med, 43, 493, 10.1016/j.compbiomed.2013.02.018 Tapak, 2013, Real-data comparison of data mining methods in prediction of diabetes in Iran, Healthc Inform Res, 19, 177, 10.4258/hir.2013.19.3.177