Efficient learning from big data for cancer risk modeling: A case study with melanoma

Computers in Biology and Medicine - Tập 110 - Trang 29-39 - 2019
Aaron N. Richter1, Taghi M. Khoshgoftaar1
1Department of Computer & Electrical Engineering and Computer Science College of Engineering and Computer Science, Florida Atlantic University, 777 Glades Road EE 403, Boca Raton, FL, 33431-0991, USA

Tài liệu tham khảo

National Cancer Institute, 2018 Steyerberg, 2009 Richter, 2018, A review of statistical and machine learning methods for modeling cancer risk using structured clinical data, Artif. Intell. Med., 90, 1, 10.1016/j.artmed.2018.06.002 Usher-Smith, 2014, Risk prediction models for melanoma: a systematic review, Canc. Epidemiol. Biomark. Prevent., 23, 1450, 10.1158/1055-9965.EPI-14-0295 Society, 2019 Romero-Lopez, 2017, Skin lesion classification from dermoscopic images using deep learning techniques Esteva, 2017, Dermatologist-level classification of skin cancer with deep neural networks, Nature, 542, 115, 10.1038/nature21056 AK, 2010, Meaningful use of electronic health records: the road ahead, JAMA, 304, 1709, 10.1001/jama.2010.1497 Hudson, 2017, The 21st century cures act — a view from the NIH, N. Engl. J. Med., 376, 111, 10.1056/NEJMp1615745 Doan, 2014, Natural language processing in biomedicine: a unified system architecture overview, Clin. Bioinformat., 275, 10.1007/978-1-4939-0847-9_16 Richter, 2017, Modernizing Analytics for melanoma with a large-scale research dataset Methods for De-identification of PHI | HHS.gov Avati Zaharia, 2016, Apache spark: a unified engine for big data processing, Commun. ACM, 59, 56, 10.1145/2934664 Chang, 2011, LIBSVM: a library for support vector machines, ACM Transact. Intell. Sys. Technol., 2, 27 Van Hulse, 2007, Experimental perspectives on learning from imbalanced data, 935 Pedregosa, 2011, Scikit-learn: machine learning in Python, J. Mach. Learn. Res., 12, 2825 van der Walt, 2011, The NumPy array: a structure for efficient numerical computation, Comput. Sci. Eng., 13, 22, 10.1109/MCSE.2011.37 Jones, 2014 Lemaître, 2017, Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning, J. Mach. Learn. Res., 18, 1 Chen, 2016, XGBoost: a scalable tree boosting system, 785 Menze, 2009, A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data, BMC Bioinf., 10, 213, 10.1186/1471-2105-10-213 Lundberg Lundberg SM, Lee S-I. A Unified Approach to Interpreting Model Predictions. ;:10. Meng, 2016, MLlib: machine learning in Apache spark, J. Mach. Learn. Res., 17, 1235 van der Ploeg, 2014, Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints, BMC Med. Res. Methodol., 14, 137, 10.1186/1471-2288-14-137 Watts, 2015, Clinical practice guidelines for identification, screening and follow-up of individuals at high risk of primary cutaneous melanoma: a systematic review, Br. J. Dermatol., 172, 33, 10.1111/bjd.13403 Liu, 2005, RxNorm: prescription for electronic drug information exchange, IT Professional, 7, 17, 10.1109/MITP.2005.122 AAPC Organization WH, others, 2012 SNOMED International LOINC. LOINC: The freely available standard for identifying health measurements, observations, and documents. https://loinc.org/(accessed 9 Apr 2017). U.S. Food & Drug Administration