A family of partial-linear single-index models for analyzing complex environmental exposures with continuous, categorical, time-to-event, and longitudinal health outcomes
Tóm tắt
Statistical methods to study the joint effects of environmental factors are of great importance to understand the impact of correlated exposures that may act synergistically or antagonistically on health outcomes. This study proposes a family of statistical models under a unified partial-linear single-index (PLSI) modeling framework, to assess the joint effects of environmental factors for continuous, categorical, time-to-event, and longitudinal outcomes. All PLSI models consist of a linear combination of exposures into a single index for practical interpretability of relative direction and importance, and a nonparametric link function for modeling flexibility. We presented PLSI linear regression and PLSI quantile regression for continuous outcome, PLSI generalized linear regression for categorical outcome, PLSI proportional hazards model for time-to-event outcome, and PLSI mixed-effects model for longitudinal outcome. These models were demonstrated using a dataset of 800 subjects from NHANES 2003–2004 survey including 8 environmental factors. Serum triglyceride concentration was analyzed as a continuous outcome and then dichotomized as a binary outcome. Simulations were conducted to demonstrate the PLSI proportional hazards model and PLSI mixed-effects model. The performance of PLSI models was compared with their counterpart parametric models. PLSI linear, quantile, and logistic regressions showed similar results that the 8 environmental factors had both positive and negative associations with triglycerides, with a-Tocopherol having the most positive and trans-b-carotene having the most negative association. For the time-to-event and longitudinal settings, simulations showed that PLSI models could correctly identify directions and relative importance for the 8 environmental factors. Compared with parametric models, PLSI models got similar results when the link function was close to linear, but clearly outperformed in simulations with nonlinear effects. We presented a unified family of PLSI models to assess the joint effects of exposures on four commonly-used types of outcomes in environmental research, and demonstrated their modeling flexibility and effectiveness, especially for studying environmental factors with mixed directional effects and/or nonlinear effects. Our study has expanded the analytical toolbox for investigating the complex effects of environmental factors. A practical contribution also included a coherent algorithm for all proposed PLSI models with R codes available.
Tài liệu tham khảo
Wild CP. Complementing the genome with an “exposome”: the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidem Biomar. 2005;14(8):1847–50.
Stafoggia M, Breitner S, Hampel R, Basagana X. Statistical approaches to address multi-pollutant mixtures and multiple exposures: the state of the science. Curr Environ Health Rep. 2017;4(4):481–90.
Sanders AP, Claus Henn B, Wright RO. Perinatal and childhood exposure to cadmium, manganese, and metal mixtures and effects on cognition and behavior: a review of recent literature. Curr Environ Health Rep. 2015;2(3):284–94.
Hamra GB, Buckley JP. Environmental exposure mixtures: questions and methods to address them. Curr Epidemiol Rep. 2018;5(2):160–5.
NIEHS Strategic Plan 2018–2023 2018 Available from: https://www.niehs.nih.gov/about/strategicplan/index.cfm#:~:text=The%20NIEHS%20strategic%20plan%202018,EHS%20Through%20Stewardship%20and%20Support.
Billionnet C, Sherrill D, Annesi-Maesano I, Study G. Estimating the health effects of exposure to multi-pollutant mixture. Ann Epidemiol. 2012;22(2):126–41.
Mann RM, Hyne RV, Choung CB, Wilson SP. Amphibians and agricultural chemicals: review of the risks in a complex environment. Environ Pollut. 2009;157(11):2903–27.
Chaumont A, Nickmilder M, Dumont X, Lundh T, Skerfving S, Bernard A. Associations between proteins and heavy metals in urine at low environmental exposures: evidence of reverse causality. Toxicol Lett. 2012;210(3):345–52.
Carrico C, Gennings C, Wheeler DC, Factor-Litvak P. Characterization of weighted quantile sum Regression for highly correlated data in a risk analysis setting. J Agr Biol Envir St. 2015;20(1):100–20.
Czarnota J, Gennings C, Colt JS, De Roos AJ, Cerhan JR, Severson RK, et al. Analysis of environmental chemical mixtures and non-Hodgkin lymphoma risk in the NCI-SEER NHL Study. Environ Health Persp. 2015;123(10):965–70.
Bobb JF, Valeri L, Claus Henn B, Christiani DC, Wright RO, Mazumdar M, et al. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics. 2015;16(3):493–508.
Valeri L, Mazumdar MM, Bobb JF, Henn BC, Rodrigues E, Sharif OIA, et al. The joint effect of prenatal exposure to metal mixtures on neurodevelopmental outcomes at 20-40 months of age: evidence from rural Bangladesh. Environ Health Persp. 2017;125(6):067015.
Zhang YQ, Dong TY, Hu WY, Wang X, Xu B, Lin ZN, et al. Association between exposure to a mixture of phenols, pesticides, and phthalates and obesity: comparison of three statistical models. Environ Int. 2019;123:325–36.
Keil AP, Buckley JP, O'Brien KM, Ferguson KK, Zhao S, White AJA. Quantile-based g-computation approach to addressing the effects of exposure mixtures. Environ Health Perspect. 2020;128(4):47004.
Levin-Schwartz Y, Gennings C, Schnaas L, Del Carmen Hernandez Chavez M, Bellinger DC, Tellez-Rojo MM, et al. Time-varying associations between prenatal metal mixtures and rapid visual processing in children. Environ Health. 2019;18(1):92.
Zhang L, Kim I. Semiparametric Bayesian kernel survival model for evaluating pathway effects. Stat Methods Med Res. 2019;28(10–11):3301–17.
Gibson EA, Nunez Y, Abuawad A, Zota AR, Renzetti S, Devick KL, et al. An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length. Environ Health-Glob. 2019;18(1):76.
Ichimura H. Semiparametric least-squares (Sls) and weighted Sls estimation of single-index Models. J Econ. 1993;58(1–2):71–120.
Horowitz JL, Hardle W. Direct semiparametric estimation of single-index models with discrete covariates. J Am Stat Assoc. 1996;91(436):1632–40.
Wang JL, Xue LG, Zhu LX, Chong YS. Estimation for a partial-linear single-index model. Ann Stat. 2010;38(1):246–74.
Hardle W, Hall P, Ichimura H. Optimal smoothing in single-index Models. Ann Stat. 1993;21(1):157–78.
Carroll RJ, Fan JQ, Gijbels I, Wand MP. Generalized partially linear single-index models. J Am Stat Assoc. 1997;92(438):477–89.
Yi GY, He WQ, Liang H. Analysis of correlated binary data under partially linear single-index logistic models. J Multivar Anal. 2009;100(2):278–90.
Wang W. Proportional hazards regression models with unknown link function and time-dependent covariates. Stat Sinica. 2004;14(3):885–905.
Huang JHZ, Liu LX. Polynomial spline estimation and inference of proportional hazards regression models with flexible relative risk form. Biometrics. 2006;62(3):793–802.
Sun J, Kopciuk KA, Lu XW. Polynomial spline estimation of partially linear single-index proportional hazards regression models. Comput Stat Data An. 2008;53(1):176–88.
Li JB, Zhang RQ. Partially varying coefficient single index proportional hazards regression models. Comput Stat Data An. 2011;55(1):389–400.
Bai Y, Fung WK, Zhu ZY. Penalized quadratic inference functions for single-index models with longitudinal data. J Multivar Anal. 2009;100(1):152–61.
Li GR, Zhu LX, Xue LG, Feng SY. Empirical likelihood inference in partially linear single-index models for longitudinal data. J Multivar Anal. 2010;101(3):718–32.
Xu PR, Zhu LX. Estimation for a marginal generalized single-index longitudinal model. J Multivar Anal. 2012;105(1):285–99.
Zhao WH, Lian H, Liang H. GEE analysis for longitudinal single-index quantile regression. J Stat Plan Infer. 2017;187:78–102.
Stoker TM. Consistent estimation of scaled coefficients. Econometrica. 1986;54(6):1461–81.
Hardle W, Stoker TM. Investigating smooth multiple-Regression by the method of average derivatives. J Am Stat Assoc. 1989;84(408):986–95.
Hardle W, Tsybakov AB. How sensitive are average derivatives. J Econ. 1993;58(1–2):31–48.
Hristache M, Juditsky A, Spokoiny V. Direct estimation of the index coefficient in a single-index model. Ann Stat. 2001;29(3):595–623.
Yu Y, Ruppert D. Penalized spline estimation for partially linear single-index models. J Am Stat Assoc. 2002;97(460):1042–54.
Xia YC, Hardle W. Semi-parametric estimation of partially linear single-index models. J Multivar Anal. 2006;97(5):1162–84.
Liang H, Liu X, Li RZ, Tsai CL. Estimation and testing for partially linear single-index Models. Ann Stat. 2010;38(6):3811–36.
Chaudhuri P. Global nonparametric-estimation of conditional quantile functions and their derivatives. J Multivar Anal. 1991;39(2):246–69.
Chaudhuri P, Doksum K, Samarov A. On average derivative quantile regression. Ann Stat. 1997;25(2):715–44.
Wu TZ, Yu KM, Yu Y. Single-index quantile regression. J Multivar Anal. 2010;101(7):1607–21.
Kong EF, Xia YC. A single-index quantile Regression model and its estimation. Economet Theor. 2012;28(4):730–68.
Lv YZ, Zhang RQ, Zhao WH, Liu JC. Quantile regression and variable selection of partial linear single-index model. Ann I Stat Math. 2015;67(2):375–409.
Ma SJ, He XM. Inference for single-index quantile Regression Models with profile optimization. Ann Stat. 2016;44(3):1234–68.
Lai P, Li GR, Lian H. Quadratic inference functions for partially linear single-index models with longitudinal data. J Multivar Anal. 2013;118:115–27.
Li GR, Lai P, Lian H. Variable selection and estimation for partially linear single-index models with longitudinal data. Stat Comput. 2015;25(3):579–93.
Li JB, Lian H, Jiang XJ, Song XY. Estimation and testing for time-varying quantile single-index models with longitudinal data. Comput Stat Data An. 2018;118:66–83.
Patel CJ, Cullen MR, Ioannidis JPA, Butte AJ. Systematic evaluation of environmental factors: persistent pollutants and nutrients correlated with serum lipid levels. Int J Epidemiol. 2012;41(3):828–43.
Zipf G, Chiappa M, Porter KS, Ostchega Y, Lewis BG, Dostal J. National health and nutrition examination survey: plan and operations, 1999–2010. Vital Health Stat 1. 2013;(56):1–37.
Weisberg S, Welsh AH. Adapting for the missing link. Ann Stat. 1994;22(4):1674–700.
Di Angelantonio E, Sarwar N, Perry P, Kaptoge S, Ray KK, Thompson A, et al. Major lipids, apolipoproteins, and risk of vascular disease. J Am Med Assoc. 2009;302(18):1993–2000.
Bind MA, Peters A, Koutrakis P, Coull B, Vokonas P, Schwartz J. Quantile Regression analysis of the distributional effects of air pollution on blood pressure, heart rate variability, blood lipids, and biomarkers of inflammation in elderly American men: the normative aging Study. Environ Health Persp. 2016;124(8):1189–98.
Burgette LF, Reiter JP, Miranda ML. Exploratory quantile Regression with many covariates an application to adverse birth outcomes. Epidemiology. 2011;22(6):859–66.
Ratcliff R, Thapar A, McKoon G. Individual differences, aging, and IQ in two-choice tasks. Cognitive Psychol. 2010;60(3):127–57.
Jung SH. Quasi-likelihood for median regression models. J Am Stat Assoc. 1996;91(433):251–7.
Koenker R, Bassett G. Regression Quantiles. Econometrica. 1978;46(1):33–50.
Koenker R, Hallock KF. Quantile regression. J Econ Perspect. 2001;15(4):143–56.
Wei Y, Pere A, Koenker R, He XM. Quantile regression methods for reference growth charts. Stat Med. 2006;25(8):1369–82.
Expert Panel on Detection E, Treatment of High Blood Cholesterol in A. Executive summary of the third report of the National Cholesterol Education Program (NCEP) expert panel on detection, evaluation, and treatment of high blood cholesterol in adults (adult treatment panel III). JAMA. 2001;285(19):2486–97.
Cox DR. Regression Models and Life-Tables. J R Stat Soc B. 1972;34(2):187–+.
Cox DR. Partial Likelihood. Biometrika. 1975;62(2):269–76.
Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via Em algorithm. J Roy Stat Soc B Met. 1977;39(1):1–38.
Laird NM, Ware JH. Random-effects Models for longitudinal data. Biometrics. 1982;38(4):963–74.
Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581–90.
Wold S, Ruhe A, Wold H, Dunn WJ. The collinearity problem in linear-Regression - the partial least-squares (Pls) approach to generalized inverses. Siam J Sci Stat Comp. 1984;5(3):735–43.
Ogihara T, Miki M, Kitagawa M, Mino M. Distribution of tocopherol among human-plasma lipoproteins. Clin Chim Acta. 1988;174(3):299–305.
Winbauer AN, Pingree SS, Nuttall KL. Evaluating serum alpha-tocopherol (vitamin E) in terms of a lipid ratio. Ann Clin Lab Sci. 1999;29(3):185–91.
Vanvliet T, Schreurs WHP, Vandenberg H. Intestinal Beta-carotene absorption and cleavage in men - response of Beta-carotene and Retinyl esters in the triglyceride-rich lipoprotein fraction after a single Oral dose of Beta-carotene. Am J Clin Nutr. 1995;62(1):110–6.
Redlich CA, Chung JS, Cullen MR, Blaner WS, Van Bennekum AM, Berglund L. Effect of long-term beta-carotene and vitamin A on serum cholesterol and triglyceride levels among participants in the Carotene and Retinol Efficacy trial (CARET) (vol 143, pg 427, 1999). Atherosclerosis. 1999;145(2):423–+.
Johnson CL, Paulose-Ram R, Ogden CL, Carroll MD, Kruszon-Moran D, Dohrmann SM, et al. National health and nutrition examination survey: analytic guidelines, 1999–2010. Vital Health Stat 2. 2013;(161):1–24.
Walter SD, Holford TR. Additive, multiplicative, and other Models for disease risks. Am J Epidemiol. 1978;108(5):341–6.
Radchenko P. High dimensional single index models. J Multivar Anal. 2015;139:266–82.
Wolff MS, Engel SM, Berkowitz GS, Ye X, Silva MJ, Zhu C, et al. Prenatal phenol and phthalate exposures and birth outcomes. Environ Health Perspect. 2008;116(8):1092–7.
Varshavsky JR, Zota AR, Woodruff TJA. Novel method for calculating potency-weighted cumulative phthalates exposure with implications for identifying racial/ethnic disparities among U.S. reproductive-aged women in NHANES 2001-2012. Environ Sci Technol. 2016;50(19):10616–24.
Howard GJ, Webster TF. Contrasting theories of interaction in epidemiology and toxicology. Environ Health Perspect. 2013;121(1):1–6.
VanderWeele TJ. On the distinction between interaction and effect modification. Epidemiology. 2009;20(6):863–71.
Pedersen EJ, Miller DL, Simpson GL, Ross N. Hierarchical generalized additive models in ecology: an introduction with mgcv. PeerJ. 2019;7:e6876.
Foster JC, Taylor JMG, Nan B. Variable selection in monotone single-index models via the adaptive LASSO. Stat Med. 2013;32(22):3944–54.
Yang H, Yang J. A robust and efficient estimation and variable selection method for partially linear single-index models. J Multivar Anal. 2014;129:227–42.
Lai P, Wang QH, Lian H. Bias-corrected GEE estimation and smooth-threshold GEE variable selection for single-index models with clustered data. J Multivar Anal. 2012;105(1):422–32.
Friedman JH, Stuetzle W. Projection Pursuit Regression. J Am Stat Assoc. 1981;76(376):817–23.
