An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data

BMC Bioinformatics - Tập 18 - Trang 1-12 - 2017
W. Duncan Wadsworth1, Raffaele Argiento2, Michele Guindani3, Jessica Galloway-Pena4, Samuel A. Shelburne5, Marina Vannucci1
1Department of Statistics, Rice University, Houston, USA
2ESOMAS Department, University of Torino and Collegio Carlo Alberto, Torino, Italy
3Department of Statistics, University of California, Irvine, USA
4Department of Infectious Disease, Infection Control, and Employee Health, The University of Texas MD Anderson Cancer Center, Houston, USA
5Department of Genomic Medicine, the University of Texas MD Anderson Cancer Center, Houston, USA

Tóm tắt

The Human Microbiome has been variously associated with the immune-regulatory mechanisms involved in the prevention or development of many non-infectious human diseases such as autoimmunity, allergy and cancer. Integrative approaches which aim at associating the composition of the human microbiome with other available information, such as clinical covariates and environmental predictors, are paramount to develop a more complete understanding of the role of microbiome in disease development. In this manuscript, we propose a Bayesian Dirichlet-Multinomial regression model which uses spike-and-slab priors for the selection of significant associations between a set of available covariates and taxa from a microbiome abundance table. The approach allows straightforward incorporation of the covariates through a log-linear regression parametrization of the parameters of the Dirichlet-Multinomial likelihood. Inference is conducted through a Markov Chain Monte Carlo algorithm, and selection of the significant covariates is based upon the assessment of posterior probabilities of inclusions and the thresholding of the Bayesian false discovery rate. We design a simulation study to evaluate the performance of the proposed method, and then apply our model on a publicly available dataset obtained from the Human Microbiome Project which associates taxa abundances with KEGG orthology pathways. The method is implemented in specifically developed R code, which has been made publicly available. Our method compares favorably in simulations to several recently proposed approaches for similarly structured data, in terms of increased accuracy and reduced false positive as well as false negative rates. In the application to the data from the Human Microbiome Project, a close evaluation of the biological significance of our findings confirms existing associations in the literature.

Tài liệu tham khảo

Morgan XC, Huttenhower C. Chapter 12: Human microbiome analysis. PLoS Comput Biol. 2012; 8(12):1002808. doi:10.1371/journal.pcbi.1002808. Zhu B, Wang X, Li L. Human gut microbiome: The second genome of human body. Protein Cell. 2010; 1(8):718–25. doi:10.1007/s13238-010-0093-z. Grice EA, Segre JA. The Human Microbiome: our second genome. Annu Rev Genomics Hum Genet. 2012; 13:151–70. doi:10.1146/annurev-genom-090711-163814. Fraher MH, O’Toole PW, Quigley EMM. Techniques used to characterize the gut microbiota: a guide for the clinician. Nat Rev Gastroenterol Hepatol. 2012; 9(6):312–22. doi:10.1038/nrgastro.2012.44. Abraham C, Cho JH. Inflammatory bowel disease. N Engl J Med. 2009; 361:2066–078. doi:10.1056/NEJMra0804647. Qin J, Li Y, Cai Z, Li S, Zhu J, Zhang F, Liang S, Zhang W, Guan Y, Shen D, Peng Y, Zhang D, Jie Z, Wu W, Qin Y, Xue W, Li J, Han L, Lu D, Wu P, Dai Y, Sun X, Li Z, Tang A, Zhong S, Li X, Chen W, Xu R, Wang M, Feng Q, Gong M, Yu J, Zhang Y, Zhang M, Hansen T, Sanchez G, Raes J, Falony G, Okuda S, Almeida M, LeChatelier E, Renault P, Pons N, Batto JM, Zhang Z, Chen H, Yang R, Zheng W, Li S, Yang H, Wang J, Ehrlich SD, Nielsen R, Pedersen O, Kristiansen K, Wang J. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature. 2012; 490(7418):55–60. doi:10.1038/nature11450. Koeth RA, Wang Z, Levison BS, Buffa JA, Org E, Sheehy BT, Britt EB, Fu X, Wu Y, Li L, Smith JD, DiDonato JA, Chen J, Li H, Wu GD, Lewis JD, Warrier M, Brown JM, Krauss RM, Tang WHW, Bushman FD, Lusis AJ, Hazen SL. Intestinal microbiota metabolism of L-carnitine, a nutrient in red meat, promotes atherosclerosis. Nat Med. 2013; 19(5):576–85. doi:10.1038/nm.3145. Cryan JF, O’Mahony SM. The microbiome-gut-brain axis: from bowel to behavior. Neurogastroenterol Motil. 2011; 23(3):187–92. doi:10.1111/j.1365-2982.2010.01664.x. Kong HH, Oh J, Deming C, Conlan S, Grice EA, Beatson MA, Nomicos E, Polley EC, Komarow HD, Program NCS, Murray PR, Turner ML, Segre JA. Temporal shifts in the skin microbiome associated with disease flares and treatment in children with atopic dermatitis. Genome Res. 2012; 22(5):850–9. doi:10.1101/gr.131029.111.850. Romero R, Hassan SS, Gajer P, Tarca AL, Fadrosh DW, Bieda J, Chaemsaithong P, Miranda J, Chaiworapongsa T, Ravel J. The vaginal microbiota of pregnant women who subsequently have spontaneous preterm labor and delivery and those with a normal delivery at term. Microbiome. 2014; 2(1):18. doi:10.1186/2049-2618-2-18. Devaraj S, Hemarajata P, Versalovic J. The human gut Microbiome and body metabolism: implications for obesity and diabetes. Clin Chem. 2013; 59(4):617–28. doi:10.1373/clinchem.2012.187617.The. Ash C, Mueller K. Manipulating the Microbiota. Science. 2016; 352(6285):530–1. Tyler AD, Smith MI, Silverberg MS. Analyzing the human Microbiome: A “How To” guide for physicians. Am J Gastroenterol. 2014; 109:983–93. Lange A, Jost S, Heider D, Bock C, Budeus B, Schilling E, Strittmatter A, Boenigk J, Hoffmann D. Ampliconduo: A split-sample filtering protocol for high-throughput amplicon sequencing of microbial communities. PLoS ONE. 2015; 10(11):1–22. The Human Microbiome Project, et al. A framework for human microbiome research. Nature. 2012; 486(7402):215–1. doi:10.1038/nature11209. McMurdie PJ, Holmes S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput Biol. 2014; 10(4):1003531. doi:10.1371/journal.pcbi.1003531. Grossmann L, Jensen M, Heider D, Jost S, Glucksman E, Hartikainen H, Mahamdallie SS, Gardner M, Hoffmann D, Bass D, Boenigk J. Protistan community analysis: key findings of a large-scale molecular sampling. ISME J. 2016; 10(9):2269–279. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Peña AG, Goodrich JK, Gordon JI, Huttley GA, Kelley ST, Knights D, Koenig JE, Ley RE, Lozupone CA, Mcdonald D, Muegge BD, Pirrung M, Reeder J, Sevinsky JR, Turnbaugh PJ, Walters WA, Widmann J, Yatsunenko T, Zaneveld J, Knight R. QIIME allows analysis of high-throughput community sequencing. Nature. 2010; 7(5):335–6. doi:10.1038/nmeth0510-335. Wu GD, Chen J, Hoffmann C, Bittinger K, Chen YY, Keilbaugh SA, Bewtra M, Knights D, Walters WA, Knight R, Sinha R, Gilroy E, Gupta K, Baldassano R, Nessel L, Li H, Bushman FD, Lewis JD. Linking long-term dietary patterns with gut microbial enterotypes. Science. 2011; 334:105–9. Youmans BP, Ajami NJ, Jiang Z-d, Campbell F, Wadsworth WD, Petrosino JF, Dupont HL, Highlander SK. Characterization of the human gut microbiome during travelers’ diarrhea. Gut Microbes. 2015; 6(2):110–9. doi:10.1080/19490976.2015.1019693. Hamady M, Lozupone CA, Knight R. Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data. ISME J. 2010; 4(1):17–27. doi:10.1038/ismej.2009.97. NIHMS150003 Fukuyama J, McMurdie PJ, Dethlefsen L, Relman DA, Holmes S. Comparisons of distance methods for combining covariates and abundances in microbiome studies. Pac Symp Biocomput. 2017; 148:352–63. Mosimann JE. On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions. Biometrika. 1962; 1(331):65–82. la Rosa PS, Brooks JP, Deych E, Boone EL, Edwards DJ, Wang Q, Sodergren E, Weinstock G, Shannon WD. Hypothesis testing and power calculations for taxonomic-based human microbiome data. PLoS ONE. 2012; 7(12):1–13. doi:10.1371/journal.pone.0052078. Holmes I, Harris K, Quince C. Dirichlet multinomial mixtures: Generative Models for Microbial Metagenomics. PLoS ONE. 2012; 7(2):30126. doi:10.1371/journal.pone.0030126. Chen J, Li H. Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis. Ann Appl Stat. 2013; 7(1):418–42. doi:10.1214/12-AOAS592. Chen J, Bushman FD, Lewis JD, Wu GD, Li H. Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics. 2013; 14(2):244–58. doi:10.1093/biostatistics/kxs038. Lin W, Shi P, Feng R, Li H. Variable selection in regression with compositional covariates. Biometrika. 2014; 101(4):785–97. doi:10.1093/biomet/asu031. The Human Microbiome Project, et al. Structure, function and diversity of the healthy human microbiome. Nature. 2012; 486(7402):207–14. doi:10.1038/nature11234. Benson AK, Kelly SA, Legge R, Ma F, Low SJ, Kim J, Zhang M, Oh PL, Nehrenberg D, Hua K, Kachman SD, Moriyama EN, Walter J, Peterson DA, Pomp D. Individuality in gut microbiota composition is a complex polygenic trait shaped by multiple environmental and host genetic factors. PNAS. 2010; 107(44):18933–8. doi:10.1073/pnas.1007028107. Goodrich JK, Davenport ER, Waters JL, Clark AG, Ley RE. Cross-species comparisons of host genetic associations with the microbiome. Science. 2016; 352(6285):29–32. doi:10.1126/science.aad9379. George EI, McCulloch RE. Approaches for Bayesian Variable Selection. Stat Sin. 1997; 7:339–73. Brown PJ, Vannucci M, Fearn T. Multivariate Bayesian variable selection and prediction. J R Stat Soc Ser B Stat Methodol. 1998; 60(3):627–41. doi:10.1111/1467-9868.00144. Smith M, Kohn R. Nonparametric regression using Bayesian variable selection. J Econ. 1996; 75(2):317–43. doi:10.1016/0304-4076(95)01763-1. Chipman H, George EI, Mcculloch RE. The Practical Implementation of Bayesian Model Selection. IMS Lect Notes - Monogr Ser. 2001; 38:67–134. Scott JG, Berger JO. Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann Stat. 2010; 38(5):2587–619. doi:10.1214/10-AOS792. Savitsky T, Vannucci M, Sha N. Variable selection for nonparametric gaussian process priors: models and computational strategies. Stat Sci. 2011; 26(1):130–49. doi:10.1214/11-STS354. Roberts GO, Rosenthal JS. Examples of Adaptive MCMC. J Comput Graph Stat. 2009; 18(2):349–67. Haario H, Saksman E, Tamminen J. Componentwise adaptation for high dimensional MCMC. Comput Stat. 2005; 20(2):265–73. doi:10.1007/BF02789703. Barbieri MM, Berger JO. Optimal predictive model selection. Ann Stat. 2004; 32(3):870–97. doi:10.1214/009053604000000238. Newton MA, Noueiry A, Sarkar D, Ahlquist P. Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics. 2004; 5(2):155–76. doi:10.1093/biostatistics/5.2.155. Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta. 1975; 405(2):442–51. doi:10.1016/0005-2795(75)90109-9. Taddy MA. Multinomial inverse regression for text analysis (with discussion). J Am Stat Assoc. 2013; 108(503):755–70. doi:10.1080/01621459.2012.734168. Geweke J. Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. Bayesian Stat 4. 2012; 8(6):169–93. Abubucker S, Segata N, Goll J, Schubert AM, Izard J, Cantarel BL, Rodriguez-Mueller B, Zucker J, Thiagarajan M, Henrissat B, White O, Kelley ST, Methé B, Schloss PD, Gevers D, Mitreva M, Huttenhower C. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput Biol. 2012; 8(6):1002358. doi:10.1371/journal.pcbi.1002358. Koropatkin NM, Cameron EA, Martens EC. How glycan metabolism shapes the human gut microbiota. Nat Rev Microbiol. 2012; 10(5):323–35. doi:10.1038/nrmicro2746. Walker AW, Ince J, Duncan SH, Webster LM, Holtrop G, Ze X, Brown D, Stares MD, Scott P, Bergerat A, Louis P, McIntosh F, Johnstone AM, Lobley GE, Parkhill J, Flint HJ. Dominant and diet-responsive groups of bacteria within the human colonic microbiota. ISME J. 2011; 5(2):220–30. doi:10.1038/ismej.2010.118. Crost EH, Tailford LE, Le Gall G, Fons M, Henrissat B, Juge N. Utilisation of Mucin Glycans by the Human Gut Symbiont Ruminococcus gnavus Is Strain-Dependent. PLoS ONE. 2013;8(10). doi:10.1371/journal.pone.0076341. Cao Y, Rocha ER, Smith CJ. Efficient utilization of complex N-linked glycans is a selective advantage for Bacteroides fragilis in extraintestinal infections. PNAS. 2014; 111(35):12901–6. doi:10.1073/pnas.1407344111. Rho JH, Wright DP, Christie DL, Clinch K, Furneaux RH, Roberton AM. A novel mechanism for desulfation of mucin: Identification and cloning of a mucin-desulfating glycosidase (sulfoglycosidase) from Prevotella strain RS2. J Bacteriol. 2005; 187(5):1543–1551. doi:10.1128/JB.187.5.1543-1551.2005. Hilyard EJ, Jones-Meehan JM, Spargo BJ, Hill RT. Enrichment, isolation, and phylogenetic identification of polycyclic aromatic hydrocarbon-degrading bacteria from Elizabeth River sediments. Appl Environ Microbiol. 2008; 74(4):1176–82. doi:10.1128/AEM.01518-07. Schöcke L, Weimer PJ. Purification and characterization of phosphoenolpyruvate carboxykinase from the anaerobic ruminal bacterium Ruminococcus flavefaciens. Arch Microbiol. 1997; 167(5):289–94. doi:10.1007/s002030050446. Yano T, Fukamachi H, Yamamoto M, Igarashi T. Characterization of L-cysteine desulfhydrase from Prevotella intermedia. Oral Microbiol Immunol. 2009; 24(6):485–92. doi:10.1111/j.1399-302X.2009.00546.x. Wright DP, Rosendale DI, Roberton AM. Prevotella enzymes involved in mucin oligosaccharide degradation and evidence for a small operon of genes expressed during growth on mucin. FEMS Microbiol Lett. 2000; 190(1):73–9. doi:10.1016/S0378-1097(00)00324-4. Takahashi K, Nishida A, Fujimoto T, Fujii M, Shioya M, Imaeda H, Inatomi O, Bamba S, Andoh A, Sugimoto M. Reduced abundance of butyrate-producing bacteria species in the fecal microbial community in Crohn’s disease. Digestion. 2016; 93(1):59–65. Jumas-Bilak E, Jean-Pierre H, Carlier JP, Teyssier C, Bernard K, Gay B, Campos J, Morio F, Marchandin H. Dialister micraerophilus sp nov and Dialister propionicifaciens sp nov., isolated from human clinical samples. Int J Syst Evol Microbiol. 2005; 55(Pt 6):2471–478. doi:10.1099/ijs.0.63715-0. Takahashi N, Yamada T. Pathways for amino acid metabolism by Prevotella intermedia and Prevotella nigrescens. Oral Microbiol Immunol. 2000; 15(2):96–102. doi:10.1034/j.1399-302x.2000.150205.x. Ruan Y, Shen L, Zou Y, Qi Z, Yin J, Jiang J, Guo L, He L, Chen Z, Tang Z, Qin S. Comparative genome analysis of Prevotella intermedia strain isolated from infected root canal reveals features related to pathogenicity and adaptation. BMC Genomics. 2015; 16(1):1–22. doi:10.1186/s12864-015-1272-3. Faith JJ, Guruge JL, Charbonneau M, Subramanian S, Seedorf H, Goodman AL, Clemente JC, Knight R, Heath AC, Leibel RL, Rosenbaum M, Gordon JI. The long-term stability of the human gut microbiota. Science. 2013; 341(6141):1237439. doi:10.1126/science.1237439. Koren O, Knights D, Gonzalez A, Waldron L, Segata N, Knight R, Huttenhower C, Ley RE. A Guide to Enterotypes across the Human Body: Meta-Analysis of Microbial Community Structures in Human Microbiome Datasets. PLoS Comput Biol. 2013; 9(1):1002863. doi:10.1371/journal.pcbi.1002863. Wang J, Linnenbrink M, Künzel S, Fernandes R, Nadeau MJ, Rosenstiel P, Baines JF. Dietary history contributes to enterotype-like clustering and functional metagenomic content in the intestinal microbiome of wild mice. PNAS. 2014; 111:2703–10. doi:10.1073/pnas.1402342111.