Detecting Shape-Based Interactions Among Environmental Chemicals Using an Ensemble of Exposure-Mixture Regression and Interpretable Machine Learning Tools
Statistics in Biosciences - Trang 1-21 - 2023
Tóm tắt
There is growing interest in discovering interactions between multiple environmental chemicals associated with increased adverse health effects. However, most existing approaches (1) either use a projection or product of multiple chemical exposures, which are difficult to interpret and (2) cannot simultaneously handle multi-ordered interactions. Therefore, we develop and validate a method to discover shape-based interactions that mimic usual toxicological interactions. We developed the Multi-ordered explanatory interaction (Moxie) algorithm by merging the efficacy of Extreme Gradient Boosting with the inferential power of Weighted Quantile Sum regression to extract synergistic interactions associated with the outcome/odds of disease in an adverse direction. We evaluated the algorithm’s performance through simulations and compared it with the currently available gold standard, the signed-iterative random forest algorithm. We used the 2017–18 US-NHANES dataset (n = 447 adults) to evaluate interactions among nine per- and poly-fluoroalkyl substances and five metals measured in whole blood in association with serum low-density lipoprotein cholesterol. In simulations, the Moxie algorithm was highly specific and sensitive and had very low false discovery rates in detecting true synergistic interactions of 2nd, 3rd, and 4th order through moderate (n = 250) to large (n = 1000) sample sizes. In NHANES data, we found a two-order synergistic interaction between cadmium and lead detected in people with whole-blood cadmium concentrations and lead above 0.605 ug/dL and 1.485 ug/dL, respectively. Our findings demonstrate a novel validated approach in environmental epidemiology for detecting shape-based toxicologically mimicking interactions by integrating exposure-mixture regression and machine learning methods.
Tài liệu tham khảo
Hamm AK, Hans Carter W Jr, Gennings C (2005) Analysis of an interaction threshold in a mixture of drugs and/or chemicals. Stat Med 24(16):2493–2507
Gibson EA (2021) Statistical and machine learning methods for pattern identification in environmental mixtures. Columbia University, New York
Gennings C (2000) On testing for drug/chemical interactions: definitions and inference. J Biopharm Stat 10(4):457–467
Gennings C, Carter W Jr, Carchman R, Teuschler L, Simmons J, Carney E (2005) A unifying concept for assessing toxicological interactions: changes in slope. Toxicol Sci 88(2):287–297
Carrico C, Gennings C, Wheeler DC, Factor-Litvak P (2015) Characterization of weighted quantile sum regression for highly correlated data in a risk analysis setting. J Agric Biol Environ Stat 20:100–120
Colicino E, Pedretti NF, Busgang SA, Gennings C (2020) Per-and poly-fluoroalkyl substances and bone mineral density: results from the bayesian weighted quantile sum regression. Environ Epidemiol 4(3):e092
Keil AP, Buckley JP, O’Brien KM, Ferguson KK, Zhao S, White AJ (2020) A quantile-based g-computation approach to addressing the effects of exposure mixtures. Environ Health Perspect 128(4):047004
Lee M, Rahbar MH, Samms-Vaughan M, Bressler J, Bach MA, Hessabi M, Grove ML, Shakespeare-Pellington S, Coore Desai C, Reece J-A et al (2019) A generalized weighted quantile sum approach for analyzing correlated data in the presence of interactions. Biom J 61(4):934–954
Bobb JF, Valeri L, Claus Henn B, Christiani DC, Wright RO, Mazumdar M, Godleski JJ, Coull BA (2015) Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics 16(3):493–508
Liu JZ, Deng W, Lee J, Lin P-ID, Valeri L, Christiani DC, Bellinger DC, Wright RO, Mazumdar MM, Coull BA (2022) A cross-validated ensemble approach to robust hypothesis testing of continuous nonlinear interactions: application to nutrition-environment studies. J Am Stat Assoc 117(538):561–573
McGee G, Wilson A, Webster TF, Coull BA (2023) Bayesian multiple index models for environmental mixtures. Biometrics 79(1):462–474. https://doi.org/10.1111/biom.13569
Bellavia A (2021) Statistical methods for environmental mixtures. https://bookdown.org/andreabellavia/mixtures/preface.html. Accessed 10 Jan 2023
Bien J, Taylor J, Tibshirani R (2013) A lasso for hierarchical interactions. Ann Stat 41(3):1111
Gennings C, Schwartz P, Carter Jr WH, Simmons JE (1997) Detection of departures from additivity in mixtures of many chemicals with a threshold model. J Agric Biol Environ Stat, 2:198–211
Kelly C, Rice J (1990) Monotone smoothing with application to dose-response curves and the assessment of synergism. Biometrics 46:1071–1085
Machado SG, Robinson GA (1994) A direct, general approach based on isobolograms for assessing the joint action of drugs in pre-clinical experiments. Stat Med 13(22):2289–2309
Yeatts SD, Gennings C, Wagner ED, Simmons JE, Plewa MJ (2010) Detecting departure from additivity along a fixed-ratio mixture ray with a piecewise model for dose and interaction thresholds. J Agric Biol Environ Stat 15:510–522
Bhat AS, Ahangar AA (2007) Methods for detecting chemical-chemical interaction in toxicology. Toxicol Mech Methods 17(8):441–450
Shmueli G (2010) To explain or to predict? Stat Sci 25(3):289–310
Gass K, Klein M, Chang HH, Flanders WD, Strickland MJ (2014) Classification and regression trees for epidemiologic research: an air pollution example. Environ Health 13(1):1–10
Lampa E, Lind L, Lind P, Bornefalk-Hermansson A (2014) The identification of complex interactions in epidemiology and toxicology: a simulation study of boosted regression trees. Environ Health 13:57
Li Y-C, Hsu H-HL, Chun Y, Chiu P-H, Arditi Z, Claudio L, Pandey G, Bunyavanich S, et al. (2021) Machine learning–driven identification of early-life air toxic combinations associated with childhood asthma outcomes. J Clin Investig 131(22):e152088
Stingone JA, Pandey OP, Claudio L, Pandey G (2017) Using machine learning to identify air pollution exposure profiles associated with early cognitive skills among us children. Environ Pollut 230:730–740
Chen T, Guestrin C (2016) XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM
Curtin P, Kellogg J, Cech N, Gennings C (2021) A random subset implementation of weighted quantile sum (wqsrs) regression for analysis of high-dimensional mixtures. Commun Stat Simul Comput 50(4):1119–1134
Tanner EM, Bornehag C-G, Gennings C (2019) Repeated holdout validation for weighted quantile sum regression. MethodsX 6:2855–2860
Joubert BR, Kioumourtzoglou M-A, Chamberlain T, Chen HY, Gennings C, Turyk ME, Miranda ML, Webster TF, Ensor KB, Dunson DB et al (2022) Powering research through innovative methods for mixtures in epidemiology (prime) program: novel and expanded statistical methods. Int J Environ Res Public Health 19(3):1378
Biau G, Scornet E (2016) A random forest guided tour. TEST 25:197–227
Gelfand S, Ravishankar C, Delp E (1991) An iterative growing and pruning algorithm for classification tree design. IEEE Trans Pattern Anal Mach Intell 13(2):163–174
Lin J (2008) Scalable language processing algorithms for the masses: a case study in computing word co-occurrence matrices with MapReduce. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 419–428, Honolulu, Hawaii. Association for Computational Linguistics
Li Y, Xu L, Tian F, Jiang L, Zhong X, Chen E (2015) Word embedding revisited: a new representation learning and explicit matrix factorization perspective. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, page 3650-3656. AAAI Press
Friedman JH, Popescu BE (2008) Predictive learning via rule ensembles. Ann Appl Stat 2(3):916–954
Basu S, Kumbier K, Brown JB, Yu B (2018) Iterative random forests to discover predictive and stable high-order interactions. Proc Natl Acad Sci 115(8):1943–1948
Kumbier K, Basu S, Brown JB, Celniker S, Yu B (2018) Refining interaction search through signed iterative random forests. arXiv preprint arXiv:1810.07287
Shah RD, Meinshausen N (2014) Random intersection trees. J Mach Learn Res 15(1):629–654
Midya V, Alcala CS, Rechtman E, Gregory JK, Kannan K, Hertz-Picciotto I, Teitelbaum SL, Gennings C, Rosa MJ, Valvi D (2023a) Machine learning assisted discovery of interactions between pesticides, phthalates, phenols, and trace elements in child neurodevelopment. Environ Sci Technol 57(46):18139–18150. https://doi.org/10.1021/acs.est.3c00848
Midya V, Lane JM, Gennings C, Torres-Olascoaga LA, Gregory JK, Wright RO, Arora M, Téllez-Rojo MM, Eggers S (2023b) Prenatal lead exposure is associated with reduced abundance of beneficial gut microbial cliques in late childhood: an investigation using microbial co-occurrence analysis (MiCA). Environ Sci Technol 57(44):16800–16810. https://doi.org/10.1021/acs.est.3c04346
Midya V, Colicino E, Conti DV, Berhane K, Garcia E, Stratakis N, Andrusaityte S, Basagaña X, Casas M, Fossati S, Gražulevičienė R, Haug LS, Heude B, Maitre L, McEachan R, Papadopoulou E, Roumeliotaki T, Philippat C, Thomsen C, Urquiza J, Vafeiadi M, Varo N, Vos MB, Wright J, McConnell R, Vrijheid M, Chatzi L, Valvi D (2022) Association of prenatal exposure to endocrine-disrupting chemicals with liver injury in children. JAMA Netw Open 5(7):e2220176–e2220176
CDC U (2013) Fourth national report on human exposure to environmental chemicals, updated tables. CDC, U
Dong Z, Wang H, Yu YY, Li YB, Naidu R, Liu Y (2019) Using 2003–2014 us nhanes data to determine the associations between per-and polyfluoroalkyl substances and cholesterol: trend and implications. Ecotoxicol Environ Saf 173:461–468
Buhari O, Dayyab F, Igbinoba O, Atanda A, Medhane F, Faillace R (2020) The association between heavy metal and serum cholesterol levels in the us population: National health and nutrition examination survey 2009–2012. Hum Exp Toxicol 39(3):355–364
Jain RB, Ducatman A (2018) Associations between lipid/lipoprotein levels and perfluoroalkyl substances among us children aged 6–11 years. Environ Pollut 243:1–8
Liu HS, Wen LL, Chu PL, Lin CY (2018) Association among total serum isomers of perfluorinated chemicals, glucose homeostasis, lipid profiles, serum protein and metabolic syndrome in adults: NHANES, 2013–2014. Environ Pollut 232:73–79
Midya V, Liao J, Gennings C, Colicino E, Teitelbaum SL, Wright RO, Valvi D (2022) Quantifying the effect size of exposure-outcome association using \(\delta\)-score: application to environmental chemical mixture studies. Symmetry 14(10):1962
Fernández-Friera L, Fuster V, López-Melgar B, Oliva B, García-Ruiz JM, Mendiguren J, Bueno H, Pocock S, Ibáñez B, Fernández-Ortiz A et al (2017) Normal ldl-cholesterol levels are associated with subclinical atherosclerosis in the absence of risk factors. J Am Coll Cardiol 70(24):2979–2991
Jellinger PS, Handelsman Y, Rosenblit PD, Bloomgarden ZT, Fonseca VA, Garber AJ, Grunberger G, Guerin CK, Bell DS, Mechanick JI et al (2017) American association of clinical endocrinologists and American college of endocrinology guidelines for management of dyslipidemia and prevention of cardiovascular disease. Endocr Pract 23:1–87
Bind M-AC, Rubin DB (2019) Bridging observational studies and randomized experiments by embedding the former in the latter. Stat Methods Med Res 28(7):1958–1978
Rubin DB (2008) For objective causal inference, design trumps analysis. Ann Appl Stat 2(3):808–840. https://doi.org/10.1214/08-AOAS187
Sommer AJ, Peters A, Rommel M, Cyrys J, Grallert H, Haller D, Müller CL, Bind M-AC (2022) A randomization-based causal inference framework for uncovering environmental exposure effects on human gut microbiota. PLoS Comput Biol 18(5):e1010044
Hansen BB (2004) Full matching in an observational study of coaching for the sat. J Am Stat Assoc 99(467):609–618
Ho D, Imai K, King G, Stuart EA (2011) MatchIt: nonparametric preprocessing for parametric causal inference. J Stat Softw 42(8):1–28. https://doi.org/10.18637/jss.v042.i08
Greifer N (2020) Covariate balance tables and plots: a guide to the cobalt package. Accessed 10 Mar 2020
Zhang Z, Kim HJ, Lonjon G, Zhu Y et al (2019) Balance diagnostics after propensity score matching. Ann Transl Med 7(1):16
Kayaaltı Z, Aliyev V, Söylemezoğlu T (2011) The potential effect of metallothionein 2A–5 A/G single nucleotide polymorphism on blood cadmium, lead, zinc and copper levels. Toxicol Appl Pharmacol 256(1):1–7
Verma N, Bal S, Gupta R, Aggarwal N, Yadav A (2020) Antioxidative effects of piperine against cadmium-induced oxidative stress in cultured human peripheral blood lymphocytes. J Diet Suppl 17(1):41–52
Fernandes KCM, Martins AC Jr, Oliveira AÁSd, Antunes LMG, Cólus IMdS, Barbosa F Jr, Barcelos GRM (2016) Polymorphism of metallothionein 2a modifies lead body burden in workers chronically exposed to the metal. Public Health Genomics 19(1):47–52
Yang X, Sun J, Ke H, Chen Y, Xu M, Luo G (2014) Metallothionein 2a genetic polymorphism and its correlation to coronary heart disease. Eur Rev Med Pharmacol Sci 18:3747–3753
Ling X-B, Wei H-W, Wang J, Kong Y-Q, Wu Y-Y, Guo J-L, Li T-F, Li J-K (2016) Mammalian metallothionein-2a and oxidative stress. Int J Mol Sci 17(9):1483
Yang C-C, Chuang C-S, Lin C-I, Wang C-L, Huang Y-C, Chuang H-Y (2017) The association of the blood lead level and serum lipid concentrations may be modified by the genetic combination of the metallothionein 2a polymorphisms rs10636 gc and rs28366003 aa. J Clin Lipidol 11(1):234–241
Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci 116(44):22071–22080