Quantile Regression Forests to Identify Determinants of Neighborhood Stroke Prevalence in 500 Cities in the USA: Implications for Neighborhoods with High Prevalence
Tóm tắt
Stroke exerts a massive burden on the US health and economy. Place-based evidence is increasingly recognized as a critical part of stroke management, but identifying the key determinants of neighborhood stroke prevalence and the underlying effect mechanisms is a topic that has been treated sparingly in the literature. We aim to fill in the research gaps with a study focusing on urban health. We develop and apply analytical approaches to address two challenges. First, domain expertise on drivers of neighborhood-level stroke outcomes is limited. Second, commonly used linear regression methods may provide incomplete and biased conclusions. We created a new neighborhood health data set at census tract level by pooling information from multiple sources. We developed and applied a machine learning–based quantile regression method to uncover crucial neighborhood characteristics for neighborhood stroke outcomes among vulnerable neighborhoods burdened with high prevalence of stroke. Neighborhoods with a larger share of non-Hispanic blacks, older adults, or people with insufficient sleep tended to have a higher prevalence of stroke, whereas neighborhoods with a higher socio-economic status in terms of income and education had a lower prevalence of stroke. The effects of five major determinants varied geographically and were significantly stronger among neighborhoods with high prevalence of stroke. Highly flexible machine learning identifies true drivers of neighborhood cardiovascular health outcomes from wide-ranging information in an agnostic and reproducible way. The identified major determinants and the effect mechanisms can provide important avenues for prioritizing and allocating resources to develop optimal community-level interventions for stroke prevention.
Tài liệu tham khảo
Mozaffarian D, Benjamin Emelia J, Go Alan S, et al. Heart disease and stroke statistics-2016 update. Circulation. 2016;133(4):e38–e360.
You Roger X, McNeil John J, O’Malley Heather M, Davis Stephen M, Thrift Amanda G, Donnan GA. Risk factors for stroke due to cerebral infarction in young adults. Stroke. 1997;28(10):1913–8.
Whisnant JP. Modeling of risk factors for ischemic stroke. Stroke. 1997;28(9):1840–4.
Müller-Nordhorn J, Nolte Christian H, Rossnagel K, et al. Knowledge about risk factors for stroke. Stroke. 2006;37(4):946–50.
Go AS, Mozaffarian D, Roger VL, Benjamin EJ, Berry JD, Blaha MJ, et al. Heart disease and stroke statistics-2014 update: a report from the American Heart Association. Circulation. 2014;129(3):e28–e292.
Bridgwood B, Lager KE, Mistri AK, Khunti K, Wilson AD, Modi P. Interventions for improving modifiable risk factor control in the secondary prevention of stroke. Cochrane Database Syst Rev. 2018;5(5):CD009103.
Cappuccio FP, Cooper D, D'Elia L, Strazzullo P, Miller MA. Sleep duration predicts cardiovascular outcomes: a systematic review and meta-analysis of prospective studies. Eur Heart J. 2011;32(12):1484–92.
Boehme AK, Esenwa C, Elkind MSV. Stroke risk factors, genetics, and prevention. Circ Res. 2017;120(3):472–95.
Kelly-Hayes M. Influence of age and health behaviors on stroke risk: lessons from longitudinal studies. J Am Geriatr Soc. 2010;58(Suppl 2):S325–8.
Schüle SA, Bolte G. Interactive and independent associations between the socioeconomic and objective built environment on the neighbourhood level and individual health: a systematic review of multilevel studies. PLoS One. 2015;10(4):e0123456.
Osypuk TL, Ehntholt A, Moon JR, Gilsanz P, Glymour MM. Neighborhood differences in post-stroke mortality. Circ Cardiovasc Qual Outcomes. 2017;10(2):e002547.
Dworkis DA, Marvel J, Sanossian N, Arora S. Neighborhood-level stroke hot spots within major United States cities. Am J Emerg Med. 2020;38(4):794–98. https://doi.org/10.1016/j.ajem.2019.06.044.
Karp David N, Wolff Catherine S, Wiebe Douglas J, Branas Charles C, Carr Brendan G, Mullen MT. Reassessing the Stroke Belt. Stroke. 2016;47(7):1939–42.
Mensah GA, Cooper RS, Siega-Riz AM, Cooper LA, Smith JD, Brown CH, et al. Reducing cardiovascular disparities through community-engaged implementation research: a National Heart, Lung, and Blood Institute workshop report. Circ Res. 2018;122(2):213–30.
Wei Y, Kehm RD, Goldberg M, Terry MB. Applications for quantile regression in epidemiology. Curr Epidemiol Rep. 2019;6(2):191–9.
Hu L, Hogan JW. Causal comparative effectiveness analysis of dynamic continuous-time treatment initiation rules with sparsely measured outcomes and death. Biometrics. 2019;75(2):695–707.
500 Cities: Local Data for Better Health. Centers for Disease Control and Prevention; 2017. https://www.cdc.gov/500cities/index.htm. Accessed June 15, 2020.
American Community Survey 5-Year Data (2009-2018). United States Census Bureau. https://www.census.gov/data/developers/data-sets/acs-5year.html. Accessed June 15, 2020.
American FactFinder (AFF). United States Census Bureau. https://data.census.gov/cedsci/. Accessed June 15, 2020.
Environmental Justice Mapping and Screening Tool. United States Environmental Protection Agency. https://www.epa.gov/ejscreen. Accessed June.15, 2020.
Kuhn M, Johnson K. Applied predictive modeling. 2nd ed. New York: Springer; 2018.
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
Genuer R, Poggi J-M, Tuleau-Malot C. Variable selection using random forests. Pattern Recogn Lett. 2010;31(14):2225–36.
Mazumdar M, Lin J-YJ, Zhang W, Li L, Liu M, Dharmarajan K, et al. Comparison of statistical and machine learning models for healthcare cost data: a simulation study motivated by Oncology Care Model (OCM) data. BMC Health Serv Res. 2020;20(1):350.
Meinshausen N. Quantile regression forests. J Mach Learn Res. 2006;7:983–99.
Dietrich S, Floegel A, Troll M, Kühn T, Rathmann W, Peters A, et al. Random Survival Forest in practice: a method for modelling complex metabolomics data in time to event analysis. Int J Epidemiol. 2016;45(5):1406–20.
Wang L, Wu Y, Li R. Quantile regression for analyzing heterogeneity in ultra-high dimension. J Am Stat Assoc. 2012;107(497):214–22.
Fang Y, Xu P, Yang J, Qin Y. A quantile regression forest based method to predict drug response and assess prediction reliability. PLoS One. 2018;13(10):e0205155.
Darst BF, Malecki KC, Engelman CD. Using recursive feature elimination in random forest to account for correlated variables in high dimensional data. BMC Genet. 2018;19(1):65.
Ishwaran H, Kogalur UB, Chen X, Minn AJ. Random survival forests for high-dimensional data. Stat Anal Data Min ASA Data Sci J. 2011;4(1):115–32.
Redeker NS, Caruso CC, Hashmi SD, Mullington JM, Grandner M, Morgenthaler TI. Workplace interventions to promote sleep health and an alert, Healthy Workforce. J Clin Sleep Med. 2019;15(4):649–57.
Srinivasan S, Williams SD. Transitioning from health disparities to a health equity research agenda: the time is now. Public Health Rep. 2014;129(Suppl 2):71–6.
Kershaw KN, Osypuk TL, Do DP, De Chavez PJ, Diez Roux AV. Neighborhood-level racial/ethnic residential segregation and incident cardiovascular disease: the multi-ethnic study of atherosclerosis. Circulation. 2015;131(2):141–8.
Zhang X, Holt JB, Yun S, Lu H, Greenlund KJ, Croft JB. Validation of multilevel regression and poststratification methodology for small area estimation of health indicators from the behavioral risk factor surveillance system. Am J Epidemiol. 2015;182(2):127–37.