Negative binomial mixed models for analyzing microbiome count data

BMC Bioinformatics - Tập 18 - Trang 1-10 - 2017
Xinyan Zhang1, Himel Mallick2,3, Zaixiang Tang4, Lei Zhang4, Xiangqin Cui1, Andrew K. Benson5, Nengjun Yi1
1Department of Biostatistics, University of Alabama at Birmingham, Birmingham, USA
2Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, USA
3Program in Medical and Population Genetics, the Broad Institute, Cambridge, USA
4Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China
5Department of Food Science and Technology and Core for Applied Genomics and Ecology, University of Nebraska, Lincoln, USA

Tóm tắt

Recent advances in next-generation sequencing (NGS) technology enable researchers to collect a large volume of metagenomic sequencing data. These data provide valuable resources for investigating interactions between the microbiome and host environmental/clinical factors. In addition to the well-known properties of microbiome count measurements, for example, varied total sequence reads across samples, over-dispersion and zero-inflation, microbiome studies usually collect samples with hierarchical structures, which introduce correlation among the samples and thus further complicate the analysis and interpretation of microbiome count data. In this article, we propose negative binomial mixed models (NBMMs) for detecting the association between the microbiome and host environmental/clinical factors for correlated microbiome count data. Although having not dealt with zero-inflation, the proposed mixed-effects models account for correlation among the samples by incorporating random effects into the commonly used fixed-effects negative binomial model, and can efficiently handle over-dispersion and varying total reads. We have developed a flexible and efficient IWLS (Iterative Weighted Least Squares) algorithm to fit the proposed NBMMs by taking advantage of the standard procedure for fitting the linear mixed models. We evaluate and demonstrate the proposed method via extensive simulation studies and the application to mouse gut microbiome data. The results show that the proposed method has desirable properties and outperform the previously used methods in terms of both empirical power and Type I error. The method has been incorporated into the freely available R package BhGLM ( http://www.ssg.uab.edu/bhglm/ and http://github.com/abbyyan3/BhGLM ), providing a useful tool for analyzing microbiome data.

Tài liệu tham khảo

Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinf. 2010;11:538. Gilbert JA, Meyer F, Bailey MJ. The future of microbial metagenomics (or is ignorance bliss?). ISME J. 2011;5(5):777–9. Ghodsi M, Liu B, Pop M. DNACLUST: accurate and efficient clustering of phylogenetic marker genes. BMC Bioinf. 2011;12:271. Cho I, Blaser MJ. The human microbiome: at the interface of health and disease. Nat Rev Genet. 2012;13(4):260–70. Wooley JC, Ye Y. Metagenomics: facts and artifacts, and computational challenges*. J Comput Sci Technol. 2009;25(1):71–81. Hugenholtz P. Exploring prokaryotic diversity in the genomic era. Genome Biol. 2002;3(2):REVIEWS0003. Knights D, Parfrey LW, Zaneveld J, Lozupone C, Knight R. Human-associated microbial signatures: examining their predictive value. Cell Host Microbe. 2011;10(4):292–6. Virgin HW, Todd JA. Metagenomics and personalized medicine. Cell. 2011;147(1):44–56. Collison M, Hirt RP, Wipat A, Nakjang S, Sanseau P, Brown JR. Data mining the human gut microbiota for therapeutic targets. Brief Bioinform. 2012;13(6):751–68. Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, Huttenhower C. Metagenomic biomarker discovery and explanation. Genome Biol. 2011;12(6):R60. Spor A, Koren O, Ley R. Unravelling the effects of the environment and host genotype on the gut microbiome. Nat Rev Microbiol. 2011;9(4):279–90. Goodrich JK, Davenport ER, Waters JL, Clark AG, Ley RE. Cross-species comparisons of host genetic associations with the microbiome. Science. 2016;352(6285):532–5. Goodrich JK, Davenport ER, Beaumont M, Jackson MA, Knight R, Ober C, Spector TD, Bell JT, Clark AG, Ley RE. Genetic determinants of the gut microbiome in UK twins. Cell Host Microbe. 2016;19(5):731–43. Blekhman R, Goodrich JK, Huang K, Sun Q, Bukowski R, Bell JT, Spector TD, Keinan A, Ley RE, Gevers D, et al. Host genetic variation impacts microbiome composition across human body sites. Genome Biol. 2015;16:191. De Filippo C, Cavalieri D, Di Paola M, Ramazzotti M, Poullet JB, Massart S, Collini S, Pieraccini G, Lionetti P. Impact of diet in shaping gut microbiota revealed by a comparative study in children from Europe and rural Africa. Proc Natl Acad Sci U S A. 2010;107(33):14691–6. Wu GD, Chen J, Hoffmann C, Bittinger K, Chen YY, Keilbaugh SA, Bewtra M, Knights D, Walters WA, Knight R, et al. Linking long-term dietary patterns with gut microbial enterotypes. Science. 2011;334(6052):105–8. Biagi E, Nylund L, Candela M, Ostan R, Bucci L, Pini E, Nikkila J, Monti D, Satokari R, Franceschi C, et al. Through ageing, and beyond: gut microbiota and inflammatory status in seniors and centenarians. PLoS One. 2010;5(5):e10667. Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, Sogin ML, Jones WJ, Roe BA, Affourtit JP, et al. A core gut microbiome in obese and lean twins. Nature. 2009;457(7228):480–4. Dominguez-Bello MG, Costello EK, Contreras M, Magris M, Hidalgo G, Fierer N, Knight R. Delivery mode shapes the acquisition and structure of the initial microbiota across multiple body habitats in newborns. Proc Natl Acad Sci U S A. 2010;107(26):11971–5. Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature. 2006;444(7122):1027–31. Samuel BS, Gordon JI. A humanized gnotobiotic mouse model of host-archaeal-bacterial mutualism. Proc Natl Acad Sci U S A. 2006;103(26):10011–6. Frank DN, St Amand AL, Feldman RA, Boedeker EC, Harpaz N, Pace NR. Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases. Proc Natl Acad Sci U S A. 2007;104(34):13780–5. Holmes E, Li JV, Athanasiou T, Ashrafian H, Nicholson JK. Understanding the role of gut microbiome-host metabolic signal disruption in health and disease. Trends Microbiol. 2011;19(7):349–59. Wagner BD, Robertson CE, Harris JK. Application of two-part statistics for comparison of sequence variant counts. PLoS One. 2011;6(5):e20296. Paulson JN, Stine OC, Bravo HC, Pop M. Differential abundance analysis for microbial marker-gene surveys. Nat Methods. 2013;10(12):1200–2. Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11(3):R25. Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106. White JR, Nagarajan N, Pop M. Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol. 2009;5(4):e1000352. Xu L, Paterson AD, Turpin W, Xu W. Assessment and selection of competing models for zero-inflated microbiome data. PLoS One. 2015;10(7):e0129606. Sohn MB, Du R, An L. A robust approach for identifying differentially abundant features in metagenomic samples. Bioinformatics. 2015;31(14):2269–75. Peng X, Li G, Liu Z. Zero-inflated beta regression for differential abundance analysis with metagenomics data. J Comput Biol. 2015;23(2):102–10. Romero R, Hassan SS, Gajer P, Tarca AL, Fadrosh DW, Nikita L, Galuppi M, Lamont RF, Chaemsaithong P, Miranda J, et al. The composition and stability of the vaginal microbiota of normal pregnant women is different from that of non-pregnant women. Microbiome. 2014;2(1):4. Benson AK, Kelly SA, Legge R, Ma F, Low SJ, Kim J, Zhang M, Oh PL, Nehrenberg D, Hua K, et al. Individuality in gut microbiota composition is a complex polygenic trait shaped by multiple environmental and host genetic factors. Proc Natl Acad Sci U S A. 2010;107(44):18933–8. Srinivas G, Moller S, Wang J, Kunzel S, Zillikens D, Baines JF, Ibrahim SM. Genome-wide mapping of gene-microbiota interactions in susceptibility to autoimmune skin blistering. Nat Commun. 2013;4:2462. Wang J, Kalyan S, Steck N, Turner LM, Harr B, Kunzel S, Vallier M, Hasler R, Franke A, Oberg HH, et al. Analysis of intestinal microbiota in hybrid house mice reveals evolutionary divergence in a vertebrate hologenome. Nat Commun. 2015;6:6440. Yatsunenko T, Rey FE, Manary MJ, Trehan I, Dominguez-Bello MG, Contreras M, Magris M, Hidalgo G, Baldassano RN, Anokhin AP, et al. Human gut microbiome viewed across age and geography. Nature. 2012;486(7402):222–7. La Rosa PS, Warner BB, Zhou Y, Weinstock GM, Sodergren E, Hall-Moore CM, Stevens HJ, Bennett Jr WE, Shaikh N, Linneman LA, et al. Patterned progression of bacterial populations in the premature infant gut. Proc Natl Acad Sci U S A. 2014;111(34):12522–7. Faust K, Lahti L, Gonze D, de Vos WM, Raes J. Metagenomics meets time series analysis: unraveling microbial community dynamics. Curr Opin Microbiol. 2015;25:56–66. Leamy LJ, Kelly SA, Nietfeldt J, Legge RM, Ma F, Hua K, Sinha R, Peterson DA, Walter J, Benson AK, et al. Host genetics and diet, but not immunoglobulin A expression, converge to shape compositional features of the gut microbiome in an advanced intercross population of mice. Genome Biol. 2014;15:552. Chen EZ, Li H. A two-part mixed-effects model for analyzing longitudinal microbiome compositional data. Bioinformatics. 2016. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian data analysis, third edition. New York: Chapman & Hall/CRC Press; 2014. McCulloch CE, Searle SR. Generalized, Linear, and Mixed Models. New York: John Wiley & Sons; 2001. Pinheiro JC, Bates DC. Mixed-effects models in S and S-PLUS. New York: Springer Verlag; 2000. Venables WN, Ripley BD. Modern applied statistics with S. New York: Springer; 2002. Schall R. Estimation in generalized linear models with random effects. Biometrika. 1991;78:719–27. Breslow NE, Clayton DC. Approximate inference in generalized linear mixed models. J Am Stat Assoc. 1993;88:9–25. McCullagh P, Nelder JA. Generalized linear models. London: Chapman and Hall; 1989. Saha K, Paul S. Bias-corrected maximum likelihood estimator of the negative binomial dispersion parameter. Biometrics. 2005;61(1):179–85. McMurdie PJ, Holmes S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput Biol. 2014;10(4):e1003531. Chen J, Li H. Variable selection for sparse dirichlet-multinomial regression with an application to microbiome data analysis. Ann Stat. 2013;7(1):418–42. Clarke SF, Murphy EF, Nilaweera K, Ross PR, Shanahan F, O’Toole PW, Cotter PD. The gut microbiota and its relationship to diet and obesity: new insights. Gut microbes. 2012;3(3):186–202. Yang J, Bindels LB, Segura Munoz RR, Martinez I, Walter J, Ramer-Tait AE, Rose DJ. Disparate metabolic responses in mice fed a high-fat diet supplemented with maize-derived non-digestible feruloylated oligo- and polysaccharides are linked to changes in the gut microbiota. PLoS One. 2016;11(1):e0146144. Etxeberria U, Arias N, Boque N, Macarulla MT, Portillo MP, Milagro FI, Martinez JA. Shifts in microbiota species and fermentation products in a dietary model enriched in fat and sucrose. Benefic Microbes. 2015;6(1):97–111. Louis S, Tappu RM, Damms-Machado A, Huson DH, Bischoff SC. Characterization of the gut microbial community of obese patients following a weight-loss intervention using whole metagenome shotgun sequencing. PLoS One. 2016;11(2):e0149564. Murphy EF, Cotter PD, Healy S, Marques TM, O’Sullivan O, Fouhy F, Clarke SF, O’Toole PW, Quigley EM, Stanton C, et al. Composition and energy harvesting capacity of the gut microbiota: relationship to diet, obesity and time in mouse models. Gut. 2010;59(12):1635–42. Clavel T, Desmarchelier C, Haller D, Gerard P, Rohn S, Lepage P, Daniel H. Intestinal microbiota in metabolic diseases: from bacterial community structure and functions to species of pathophysiological relevance. Gut microbes. 2014;5(4):544–51. Schulz MD, Atay C, Heringer J, Romrig FK, Schwitalla S, Aydin B, Ziegler PK, Varga J, Reindl W, Pommerenke C, et al. High-fat-diet-mediated dysbiosis promotes intestinal carcinogenesis independently of obesity. Nature. 2014;514(7523):508–12. Million M, Angelakis E, Maraninchi M, Henry M, Giorgi R, Valero R, Vialettes B, Raoult D. Correlation between body mass index and gut concentrations of Lactobacillus reuteri, Bifidobacterium animalis, Methanobrevibacter smithii and Escherichia coli. Int J Obes (Lond). 2013;37(11):1460–6. Fenollar F, Nicoli F, Paquet C, Lepidi H, Cozzone P, Antoine JC, Pouget J, Raoult D. Progressive dementia associated with ataxia or obesity in patients with Tropheryma whipplei encephalitis. BMC Infect Dis. 2011;11:171. Yan H, Potu R, Lu H, Vezzoni de Almeida V, Stewart T, Ragland D, Armstrong A, Adeola O, Nakatsu CH, Ajuwon KM. Dietary fat content and fiber type modulate hind gut microbial community and metabolic markers in the pig. PLoS One. 2013;8(4):e59581. Graf D, Di Cagno R, Fak F, Flint HJ, Nyman M, Saarela M, Watzl B. Contribution of diet to the composition of the human gut microbiota. Microb Ecol Health Dis. 2015;26:26164. Lecomte V, Kaakoush NO, Maloney CA, Raipuria M, Huinao KD, Mitchell HM, Morris MJ. Changes in gut microbiota in rats fed a high fat diet correlate with obesity-associated metabolic parameters. PLoS One. 2015;10(5):e0126931. Ley RE, Lozupone CA, Hamady M, Knight R, Gordon JI. Worlds within worlds: evolution of the vertebrate gut microbiota. Nat Rev Microbiol. 2008;6(10):776–88. Ochman H, Worobey M, Kuo CH, Ndjango JB, Peeters M, Hahn BH, Hugenholtz P. Evolutionary relationships of wild hominids recapitulated by gut microbial communities. PLoS Biol. 2010;8(11):e1000546. Li Y, Ismail AI, Ge Y, Tellez M, Sohn W. Similarity of bacterial populations in saliva from African-American mother-child dyads. J Clin Microbiol. 2007;45(9):3082–5. Li Y, Caufield PW, Dasanayake AP, Wiener HW, Vermund SH. Mode of delivery and other maternal factors influence the acquisition of Streptococcus mutans in infants. J Dent Res. 2005;84(9):806–11. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40.