Predicting Diabetes Mellitus With Machine Learning Techniques

Quan Zou1,2, Kaiyang Qu2, Yamei Luo3, Dehui Yin3, Ying Ju4, Hua Tang5
1Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
2School of Computer Science and Technology, Tianjin University, Tianjin, China
3School of Medical Information and Engineering, Southwest Medical University, Luzhou, China
4School of Information Science and Technology, Xiamen University, Xiamen, China
5Department of Pathophysiology, School of Basic Medicine, Southwest Medical University, Luzhou, China

Tóm tắt

Từ khóa


Tài liệu tham khảo

Alghamdi, 2017, Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: the henry ford exercise testing (FIT) project., PLoS One, 12, 10.1371/journal.pone.0179805

2012, Diagnosis and classification of diabetes mellitus., Diabetes Care, S64, 10.2337/dc12-s064

Bengio, 2005, Bias in Estimating the Variance of K -Fold Cross-Validation., 75, 10.1007/0-387-24555-3_5

Breiman, 2001, Random forest., Mach. Learn., 45, 5, 10.1023/A:1010933404324

Chen, 2016, Identification of bacterial cell wall lyases via pseudo amino acid composition., Biomed. Res. Int., 2016, 10.1155/2016/1654623

Cox, 2009, Tests for screening and diagnosis of type 2 diabetes., Clin. Diabetes, 27, 132, 10.2337/diaclin.27.4.132

Duygu, 2011, An automatic diabetes diagnosis system based on LDA-wavelet support vector machine classifier., Expert Syst. Appl., 38, 8311, 10.1016/j.eswa.2011.01.017

Friedl, 1997, Decision tree classification of land cover from remotely sensed data., Remote Sens. Environ., 61, 399, 10.1016/S0034-4257(97)00049-7

Georga, 2013, Multivariate prediction of subcutaneous glucose concentration in type 1 diabetes patients based on support vector regression., IEEE J. Biomed. Health Inform., 17, 71, 10.1109/TITB.2012.2219876

Habibi, 2015, Type 2 diabetes mellitus screening and risk factors using decision tree: results of data mining., Glob. J. Health Sci., 7, 304, 10.5539/gjhs.v7n5p304

Han, 2015, Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes., IEEE J. Biomed. Health Inform., 19, 728, 10.1109/JBHI.2014.2325615

Iancu, 2008, “Method for the analysing of blood glucose dynamics in diabetes mellitus patients,” in, Proceedings of the 2008 IEEE International Conference on Automation, Quality and Testing, Robotics, 10.1109/AQTR.2008.4588883

Jackson, 1993, Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches., Ecology, 74, 2204, 10.2307/1939574

Jegan, 2014, Classification of diabetes disease using support vector machine., Microcomput. Dev., 3, 1797

Jia, 2018, O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique., Bioinformatics, 34, 2029, 10.1093/bioinformatics/bty039

Jiang, 2004, Editing training data for kNN classifiers with neural network ensemble., Lect. Notes Comput. Sci., 3173, 356, 10.1007/978-3-540-28647-9_60

Jolliffe, 1998, “Principal components analysis,” in, Proceedings of the International Conference on Document Analysis and Recognition

Kavakiotis, 2017, Machine learning and data mining methods in diabetes research., Comput. Struct. Biotechnol. J., 15, 104, 10.1016/j.csbj.2016.12.005

Kim, 2009, Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap., Comput. Stat. Data Anal., 53, 3735, 10.1016/j.csda.2009.04.009

Kohabi, 1996, “Scaling up the accuracy of naive-bayes classifiers : a decision-tree hybrid,” in, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining

Kohavi, 1995, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in, Proceedings of the 14th International Joint Conference on Artificial Intelligence

Krasteva, 2011, Oral cavity and systemic diseases—Diabetes Mellitus., Biotechnol. Biotechnol. Equip., 25, 2183, 10.5504/BBEQ.2011.0022

Lee, 2016, Identification of type 2 diabetes risk factors using phenotypes consisting of anthropometry and triglycerides based on machine learning., IEEE J. Biomed. Health Inform., 20, 39, 10.1109/JBHI.2015.2396520

Li, 2016, Prediction of linear B-cell epitopes with mRMR feature selection and analysis., Curr. Bioinform., 11, 22, 10.2174/1574893611666151119215131

Liao, 2016, Prediction of G protein-coupled receptors with SVM-Prot features and random forest., Scientifica, 2016, 10.1155/2016/8309253

Liao, 2018, Classification of small GTPases with hybrid protein features and advanced machine learning techniques., Curr. Bioinform., 13, 492, 10.2174/1574893612666171121162552

Liaw, 2002, Classification and regression by randomforest., R. News, 2, 18

Lin, 2014, LibD3C: ensemble classifiers with a clustering and dynamic selection strategy., Neurocomputing, 123, 424, 10.1016/j.neucom.2013.08.004

Lonappan, 2007, Diagnosis of diabetes mellitus using microwaves., J. Electromagnet. Wave., 21, 1393, 10.1163/156939307783239429

Mukai, 2012, A computational identification method for GPI-anchored proteins by artificial neural network., Curr. Bioinform., 7, 125, 10.2174/157489312800604390

Ozcift, 2011, Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms., Comput. Methods Programs Biomed., 104, 443, 10.1016/j.cmpb.2011.03.018

Pal, 2005, Random forest classifier for remote sensing classification., Int. J. Remote Sens., 26, 217, 10.1080/01431160412331269698

Polat, 2007, An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease., Digit. Signal Process., 17, 702, 10.1016/j.dsp.2006.09.005

Polat, 2005, “The medical applications of attribute weighted artificial immune system (AWAIS): diagnosis of heart and diabetes diseases,” in, Proceedings of the 4th International Conference on Artificial Immune Systems

Quinlan, 1986, Induction on decision tree., Mach. Learn., 1, 81, 10.1007/BF00116251

Quinlan, , “Bagging, boosting, and C4.5,” in, Proceedings of the Thirteenth National Conference on Artificial Intelligence, 725

Quinlan, , Improved use of continuous attributes in C4.5., J. Artif. Intell. Res., 4, 77, 10.1613/jair.279

Razavian, 2015, Population-level prediction of type 2 diabetes from claims data and analysis of risk factors., Big Data, 3, 277, 10.1089/big.2015.0020

Refaeilzadeh, 2016, “Cross-validation,” in, Encyclopedia of Database Systems, 532

Robertson, 2011, Blood glucose prediction using artificial neural networks trained with the AIDA diabetes simulator: a proof-of-concept pilot study., J. Electr. Comput. Eng., 2011, 10.1155/2011/681786

Sakar, 2012, A feature selection method based on kernel canonical correlation analysis and the minimum redundancy-maximum relevance filter method., Expert Syst. Appl., 39, 3432, 10.1016/j.eswa.2011.09.031

Salzberg, 1994, C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann publishers, Inc., 1993., Mach. Learn., 16, 235, 10.1007/BF00993309

Sharma, 2014, classification through machine learning technique: C4. 5 algorithm based on various entropies., Int. J. Comput. Appl., 82, 28

Smith, 2002, A tutorial on principal components analysis., Inform. Fusion, 51

Su, 2018, iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC., Bioinformatics, 10.1093/bioinformatics/bty508

Svetnik, 2015, Random forest: a classification and regression tool for compound classification and QSAR modeling., J. Chem. Inform. Comput. Sci., 43, 1947, 10.1021/ci034160g

Tang, 2018, HBPred: a tool to identify growth hormone-binding proteins., Int. J. Biol. Sci., 14, 957, 10.7150/ijbs.24174

Tang, 2018, Tumor origin detection with tissue-specific miRNA and DNA methylation markers., Bioinformatics, 34, 398, 10.1093/bioinformatics/btx622

Wang, 2018, Analysis and prediction of nitrated tyrosine sites with the mRMR method and support vector machine algorithm., Curr. Bioinform., 13, 3, 10.2174/1574893611666160608075753

Wang, 2003, Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition., Pattern Recogn., 36, 2429, 10.1016/S0031-3203(03)00044-X

Watkins, 2002, “A resource limited artificial immune classifier,” in, Proceedings of the 2002 Congress on Evolutionary Computation (CEC2002), 926, 10.1109/CEC.2002.1007049

Wei, 2018, Fast prediction of protein methylation sites using a sequence-based feature selection technique., IEEE/ACM Trans. Comput. Biol. Bioinform., 10.1109/TCBB.2017.2670558

Yang, 2018, iRSpot-Pse6NC: identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC., Int. J. Biol. Sci., 14, 883, 10.7150/ijbs.24616

Yang, 2016, Identification of secretory proteins in Mycobacterium tuberculosis using pseudo amino acid composition., Biomed. Res. Int., 2016, 10.1155/2016/5413903

You, 2018, Low rank representation and its application in bioinformatics., Curr. Bioinform., 13, 508, 10.2174/1574893612666171121155347

Yue, 2008, “An intelligent diagnosis to type 2 diabetes based on QPSO algorithm and WLS-SVM,” in, Proceedings of the 2008 IEEE International Symposium on Intelligent Information Technology Application Workshops, 10.1109/IITA.Workshops.2008.36

Zhao, 2014, Exploratory predicting protein folding model with random forest and hybrid features., Curr. Proteom., 11, 289, 10.2174/157016461104150121115154

Zou, , Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy., BMC Syst. Biol., 10.1186/s12918-016-0353-5

Zou, , A novel features ranking metric with application to scalable visual and bioinformatics data classification., Neurocomputing, 173, 346, 10.1016/j.neucom.2014.12.123