Robust and Powerful Differential Composition Tests for Clustered Microbiome Data
Tóm tắt
Thanks to advances in high-throughput sequencing technologies, the importance of microbiome to human health and disease has been increasingly recognized. Analyzing microbiome data from sequencing experiments is challenging due to their unique features such as compositional data, excessive zero observations, overdispersion, and complex relations among microbial taxa. Clustered microbiome data have become prevalent in recent years from designs such as longitudinal studies, family studies, and matched case–control studies. The within-cluster dependence compounds the challenge of the microbiome data analysis. Methods that properly accommodate intra-cluster correlation and features of the microbiome data are needed. We develop robust and powerful differential composition tests for clustered microbiome data. The methods do not rely on any distributional assumptions on the microbial compositions, which provides flexibility to model various correlation structures among taxa and among samples within a cluster. By leveraging the adjusted sandwich covariance estimate, the methods properly accommodate sample dependence within a cluster. The two-part version of the test can further improve power in the presence of excessive zero observations. Different types of confounding variables can be easily adjusted for in the methods. We perform extensive simulation studies under commonly adopted clustered data designs to evaluate the methods. We demonstrate that the methods properly control the type I error under all designs and are more powerful than existing methods in many scenarios. The usefulness of the proposed methods is further demonstrated with two real datasets from longitudinal microbiome studies on pregnant women and inflammatory bowel disease patients. The methods have been incorporated into the R package “miLineage” publicly available at
https://tangzheng1.github.io/tanglab/software.html
.
Tài liệu tham khảo
Alekseyenko AV, Perez-Perez GI, De Souza A, Strober B, Gao Z, Bihan M, Li K, Methé BA, Blaser MJ (2013) Community differentiation of the cutaneous microbiota in psoriasis. Microbiome 1(1):31
Boos DD (1992) On generalized score tests. Am Stat 46(4):327–333
Braun TM, Feng Z (2001) Optimal permutation tests for the analysis of group randomized trials. J Am Stat Assoc 96(456):1424–1432
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI et al (2010) Qiime allows analysis of high-throughput community sequencing data. Nat Methods 7(5):335
Cario MC, Nelson BL (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical report. Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston
Chen EZ, Li H (2016) A two-part mixed-effects model for analyzing longitudinal microbiome compositional data. Bioinformatics 32(17):2611–2617
Collado MC, Isolauri E, Laitinen K, Salminen S (2008) Distinct composition of gut microbiota during pregnancy in overweight and normal-weight women-. Am J Clin Nutr 88(4):894–899
Cragg JG (1971) Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 39:829–844
Davies R (1980) The distribution of a linear combination of \(\chi ^2\) random variables. J Roy Stat Soc Ser C 29(3):323–333
Diggle P, Heagerty P, Liang KY, Zeger S (2002) Analysis of longitudinal data. Oxford University Press, Oxford
DiGiulio DB, Callahan BJ, McMurdie PJ, Costello EK, Lyell DJ, Robaczewska A, Sun CL, Goltsman DS, Wong RJ, Shaw G et al (2015) Temporal and spatial variation of the human microbiota during pregnancy. Proc Natl Acad Sci 112(35):11060–11065
Frees EW (2009) Regression modeling with actuarial and financial applications. Cambridge University Press, Cambridge
Gail MH, Mark SD, Carroll RJ, Green SB, Pee D (1996) On design considerations and randomization-based inference for community intervention trials. Stat Med 15(11):1069–1092
Gilbert JA, Quinn RA, Debelius J, Xu ZZ, Morton J, Garg N, Jansson JK, Dorrestein PC, Knight R (2016) Microbiome-wide association studies link dynamic microbial consortia to disease. Nature 535:94–103
Gilbert JA, Blaser MJ, Caporaso JG, Jansson JK, Lynch SV, Knight R (2018) Current understanding of the human microbiome. Nat Med 24(4):392
Goodrich JK, Waters JL, Poole AC, Sutter JL, Koren O, Blekhman R, Beaumont M, Van Treuren W, Knight R, Bell JT et al (2014) Human genetics shape the gut microbiome. Cell 159(4):789–799
Halfvarson J, Brislawn CJ, Lamendella R, Vázquez-Baeza Y, Walters WA, Bramer LM, D’Amato M, Bonfiglio F, McDonald D, Gonzalez A et al (2017) Dynamics of the human gut microbiome in inflammatory bowel disease. Nat Microbiol 2(5):17004
Hardin JW, Hilbe JM (2002) Generalized estimating equations. Chapman and Hall/CRC, Boca Raton
Koren O, Goodrich JK, Cullender TC, Spor A, Laitinen K, Bäckhed HK, Gonzalez A, Werner JJ, Angenent LT, Knight R et al (2012) Host remodeling of the gut microbiome and metabolic changes during pregnancy. Cell 150(3):470–480
Kostic AD, Gevers D, Siljander H, Vatanen T, Hyötyläinen T, Hämäläinen AM, Peet A, Tillmann V, Pöhö P, Mattila I et al (2015) The dynamics of the human infant gut microbiome in development and in progression toward type 1 diabetes. Cell Host Microbe 17(2):260–273
La Rosa PS, Brooks JP, Deych E, Boone EL, Edwards DJ, Wang Q, Sodergren E, Weinstock G, Shannon WD (2012) Hypothesis testing and power calculations for taxonomic-based human microbiome data. PLoS ONE 7(12):e52078
La Rosa PS, Warner BB, Zhou Y, Weinstock GM, Sodergren E, Hall-Moore CM, Stevens HJ, Bennett WE, Shaikh N, Linneman LA et al (2014) Patterned progression of bacterial populations in the premature infant gut. Proc Natl Acad Sci 111(34):12522–12527
Li H (2015) Microbiome, metagenomics, and high-dimensional compositional data analysis. Annu Rev Stat Appl 2:73–94
Liang KY, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73(1):13–22
Lin X (1997) Variance component testing in generalised linear models with random effects. Biometrika 84:309–326
Nuriel-Ohayon M, Neuman H, Koren O (2016) Microbial changes during pregnancy, birth, and infancy. Front Microbiol 7:1031
O’Brien JD, Record N, Countway P (2016) The power and pitfalls of Dirichlet–multinomial mixture models for ecological count data. bioRxiv. https://doi.org/10.1101/045468
Pesarin F, Salmaso L (2010) Permutation tests for complex data: theory, applications and software. Wiley, Hoboken
Sainani K (2010) The importance of accounting for correlated observations. PMR 2(9):858–861
Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C (2012) Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 9(8):811
Smith MI, Yatsunenko T, Manary MJ, Trehan I, Mkakosya R, Cheng J, Kau AL, Rich SS, Concannon P, Mychaleckyj JC et al (2013) Gut microbiomes of Malawian twin pairs discordant for Kwashiorkor. Science 339(6119):548–554
Storey JD et al (2003) The positive false discovery rate: a Bayesian interpretation and the q-value. Ann Stat 31(6):2013–2035
Tang ZZ, Chen G (2018) Zero-inflated generalized Dirichlet multinomial regression model for microbiome compositional data analysis. Biostatistics. https://doi.org/10.1093/biostatistics/kxy025
Tang ZZ, Lin DY (2015) Meta-analysis for discovering rare-variant associations: statistical methods and software programs. Am J Hum Genet 97:35–53
Tang ZZ, Chen G, Alekseyenko AV, Li H (2017) A general framework for association analysis of microbial communities on a taxonomic tree. Bioinformatics 33(9):1278–1285
Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89(1):82–93
Zhang X, Mallick H, Tang Z, Zhang L, Cui X, Benson AK, Yi N (2017) Negative binomial mixed models for analyzing microbiome count data. BMC Bioinform 18(1):4