Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults

Jaime Lynn Speiser1, Kathryn E. Callahan2, Denise K. Houston2, Jason Fanning3, Thomas M. Gill4, Jack M. Guralnik5, Anne B. Newman6, Marco Pahor7, W. Jack Rejeski3, Michael Marsiske1
1Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, North Carolina
2Department of Internal Medicine, Section on Gerontology and Geriatric Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina
3Department of Health and Exercise Science, Wake Forest University, Winston-Salem, North Carolina
4Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut
5Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore
6Department of Epidemiology, Graduate School of Public Health, University of Pittsburgh, Pennsylvania
7Department of Aging and Geriatric Research, University of Florida, Gainesville

Tóm tắt

Abstract Background Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty in understanding the complex algorithms that underlie models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. Method We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Results Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated using data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Conclusions Machine learning methods offer an alternative to traditional approaches for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.

Từ khóa


Tài liệu tham khảo

Rajkomar, 2019, Machine learning in medicine, N Engl J Med., 380, 1347, 10.1056/NEJMra1814259

Zitnik, 2019, Machine learning for integrating data in biology and medicine: principles, practice, and opportunities, Inf Fusion., 50, 71, 10.1016/j.inffus.2018.09.012

Hastie, 2009, The Elements of Statistical Learning, 10.1007/978-0-387-84858-7

Howcroft, 2017, Prospective fall-risk prediction models for older adults based on wearable sensors, IEEE Trans Neural Syst Rehabil Eng., 25, 1812, 10.1109/TNSRE.2017.2687100

Iluz, 2016, Can a body-fixed sensor reduce Heisenberg’s uncertainty when it comes to the evaluation of mobility? effects of aging and fall risk on transitions in daily living, J Gerontol A Biol Sci Med Sci., 71, 1459, 10.1093/gerona/glv049

Marschollek, 2012, Mining geriatric assessment data for in-patient fall prediction models and high-risk subgroups, BMC Med Inform Decis Mak., 12, 19, 10.1186/1472-6947-12-19

Marschollek, 2011, Sensor-based fall risk assessment–an expert ‘to go’, Methods Inf Med, 50, 420, 10.3414/ME10-01-0040

Marschollek, Assessing elderly persons’ fall risk using spectral analysis on accelerometric data-a clinical evaluation study, 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 10.1109/IEMBS.2008.4650008

Nait Aicha, 2018, Deep learning to predict falls in older adults based on daily-life trunk accelerometry, Sensors, 18, 1654, 10.3390/s18051654

Speiser, 2015, Random forest classification of etiologies for an orphan disease, Stat Med., 34, 887, 10.1002/sim.6351

Fernández-Delgado, 2014, Do we need hundreds of classifiers to solve real world classification problems, J Mach Learn Res., 15, 3133

Fernández-Delgado, 2019, An extensive experimental survey of regression methods, Neural Netw., 111, 11, 10.1016/j.neunet.2018.12.010

Magoulas, Machine learning in medical applications, Advanced Course on Artificial Intelligence, 10.1007/3-540-44673-7_19

Grams, 2012, Candidacy for kidney transplantation of older adults, J Am Geriatr Soc., 60, 1, 10.1111/j.1532-5415.2011.03652.x

Cataudella, 2017, Neutrophil-to-lymphocyte ratio: an emerging marker predicting prognosis in elderly adults with community-acquired pneumonia, J Am Geriatr Soc., 65, 1796, 10.1111/jgs.14894

Stenholm, 2015, Physiological factors contributing to mobility loss over 9 years of follow-up—results from the InCHIANTI study, J Gerontol A Biol Sci Med Sci., 70, 591, 10.1093/gerona/glv004

Eavani, 2018, Heterogeneity of structural and functional imaging patterns of advanced brain aging revealed via machine learning methods, Neurobiol Aging., 71, 41, 10.1016/j.neurobiolaging.2018.06.013

Huang, 2016, Longitudinal clinical score prediction in Alzheimer’s disease with soft-split sparse regression based random forest, Neurobiol Aging., 46, 180, 10.1016/j.neurobiolaging.2016.07.005

Mathotaarachchi, 2017, Identifying incipient dementia individuals using machine learning and amyloid imaging, Neurobiol Aging., 59, 80, 10.1016/j.neurobiolaging.2017.06.027

van der Zande, 2016, Applying random forest machine learning to diagnose Alzheimer’s disease and dementia with Lewy bodies: a combination of electroencephalography (EEG), clinical parameters and biomarkers, Alzheimer’s & Dementia, 12, P661, 10.1016/j.jalz.2016.06.1501

Gannod, 2019, A machine learning recommender system to tailor preference assessments to enhance person-centered care among nursing home residents, Gerontologist., 59, 167, 10.1093/geront/gny056

Wallace, 2019, Multidimensional sleep and mortality in older adults: a machine-learning comparison with other risk factors, J Gerontol A Biol Sci Med Sci., 74, 1903, 10.1093/gerona/glz044

Furberg, 2008, Distribution and correlates of lipoprotein-associated phospholipase a2 in an elderly cohort: the cardiovascular health study, J Am Geriatr Soc, 56, 792, 10.1111/j.1532-5415.2008.01667.x

Mamoshina, 2018, Population specific biomarkers of human aging: a big data study using South Korean, Canadian, and Eastern European patient populations, J Gerontol A Biol Sci Med Sci., 73, 1482, 10.1093/gerona/gly005

Odden, 2019, Machine learning in aging research, J Gerontol A Biol Sci Med Sci., 74, 1901, 10.1093/gerona/glz074

Morgan, 1963, Problems in the analysis of survey data, and a proposal, J Am Stat Assoc, 58, 415, 10.1080/01621459.1963.10500855

Loh, 2014, Fifty years of classification and regression trees, Int Stat Rev, 10.1111/insr.12016

Breiman, 1984, Classification and Regression Trees.

Mehta

Breiman, 2001, Random forests, Mach Learn., 45, 5, 10.1023/A:1010933404324

Ishwaran, 2008, Random survival forests, The Annals of Applied Statistics, 841

Collins, 2015, Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement, Br J Surg., 102, 148, 10.1002/bjs.9736

Luo, 2016, Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view, J Med Internet Res., 18, e323, 10.2196/jmir.5870

Gill, 2016, Effect of structured physical activity on prevention of serious fall injuries in adults aged 70–89: randomized clinical trial (LIFE Study), BMJ., 352, i245, 10.1136/bmj.i245

Pahor, 2014, Effect of structured physical activity on prevention of major mobility disability in older adults: the LIFE study randomized clinical trial, JAMA., 311, 2387, 10.1001/jama.2014.5616

Team R., 2015, RStudio: Integrated Development for R

Yoshida, 2015

Liaw, 2002, Classification and regression by randomforest, R News, 2, 18

Therneau, 1997, An Introduction to Recursive Partitioning Using the Rpart Routines

Tharwat, , Classification assessment methods, Applied Computing and Informatics

Kuhn, 2008, Building predictive models in R using the caret package, Journal of Statistical Software, 28, 1, 10.18637/jss.v028.i05

LeDell, 2014

Paul, 2013, Three simple clinical tests to accurately predict falls in people with Parkinson’s disease, Mov Disord., 28, 655, 10.1002/mds.25404

Tiedemann, 2010, The development and validation of a brief performance-based fall risk assessment tool for use in primary care, J Gerontol A Biol Sci Med Sci., 65, 896, 10.1093/gerona/glq067

Vratsistas-Curto, 2018, External validation of approaches to prediction of falls during hospital rehabilitation stays and development of a new simpler tool, J Rehabil Med., 50, 216, 10.2340/16501977-2290

van der Ploeg, 2014, Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints, BMC Med Res Methodol., 14, 137, 10.1186/1471-2288-14-137

Harrell, 2019, Road map for choosing between statistical modeling and machine learning, Statistical Thinking