A Dirichlet-Tree Multinomial Regression Model for Associating Dietary Nutrients with Gut Microorganisms

Biometrics - Tập 73 Số 3 - Trang 792-801 - 2017
Tao Wang1,2, Hongyu Zhao3,2
1Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, Shanghai, China
2SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
3Department of Biostatistics, Yale University , New Haven, Connecticut , U.S.A.

Tóm tắt

SummaryUnderstanding the factors that alter the composition of the human microbiota may help personalized healthcare strategies and therapeutic drug targets. In many sequencing studies, microbial communities are characterized by a list of taxa, their counts, and their evolutionary relationships represented by a phylogenetic tree. In this article, we consider an extension of the Dirichlet multinomial distribution, called the Dirichlet-tree multinomial distribution, for multivariate, over-dispersed, and tree-structured count data. To address the relationships between these counts and a set of covariates, we propose the Dirichlet-tree multinomial regression model for which we develop a penalized likelihood method for estimating parameters and selecting covariates. For efficient optimization, we adopt the accelerated proximal gradient approach. Simulation studies are presented to demonstrate the good performance of the proposed procedure. An analysis of a data set relating dietary nutrients with bacterial counts is used to show that the incorporation of the tree structure into the model helps increase the prediction power.

Từ khóa


Tài liệu tham khảo

Aitchison, 1982, The statistical analysis of compositional data, Journal of the Royal Statistical Society, Series B, 44, 139, 10.1111/j.2517-6161.1982.tb01195.x

Arumugam, 2011, Enterotypes of the human gut microbiome, Nature, 473, 174, 10.1038/nature09944

Beck, 2009, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, 2, 183, 10.1137/080716542

Billheimer, 2001, Statistical interpretation of species composition, Journal of the American statistical Association, 96, 1205, 10.1198/016214501753381850

Chen, 2013, Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis, The Annals of Applied Statistics, 7, 418, 10.1214/12-AOAS592

Cho, 2012, The human microbiome: at the interface of health and disease, Nature Reviews Genetics, 13, 260, 10.1038/nrg3182

Clemente, 2012, The impact of the gut microbiota on human health: an integrative view, Cell, 148, 1258, 10.1016/j.cell.2012.01.035

David, 2014, Diet rapidly and reproducibly alters the human gut microbiome, Nature, 505, 559, 10.1038/nature12820

De Filippo, 2010, Impact of diet in shaping gut microbiota revealed by a comparative study in children from Europe and rural Africa, Proceedings of the National Academy of Sciences, 107, 14691, 10.1073/pnas.1005963107

Dennis, 1991, On the hyper-Dirichlet type 1 and hyper-Liouville distributions, Communications in Statistics-Theory and Methods, 20, 4069, 10.1080/03610929108830757

Dennis, 1996, A Bayesian analysis of tree-structured statistical decision problems, Journal of Statistical Planning and Inference, 53, 323, 10.1016/0378-3758(95)00112-3

Garcia, 2014, Identification of important regressor groups, subgroups and individuals via regularization methods: Application to gut microbiome data, Bioinformatics, 30, 831, 10.1093/bioinformatics/btt608

Haffari, 2009, Hierarchical Dirichlet trees for information retrieval, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 173

Jenatton, 2011, Proximal methods for hierarchical sparse coding, The Journal of Machine Learning Research, 12, 2297

Kim, 2012, Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping, The Annals of Applied Statistics, 6, 1095, 10.1214/12-AOAS549

Kuczynski, 2011, Experimental and analytical tools for studying the human microbiome, Nature Reviews Genetics, 13, 47, 10.1038/nrg3129

Matsen, 2010, pplacer: Linear time maximum-likelihood and bayesian phylogenetic placement of sequences onto a fixed reference tree, BMC Bioinformatics, 11, 538, 10.1186/1471-2105-11-538

McArdle, 2001, Fitting multivariate models to community data: a comment on distance-based redundancy analysis, Ecology, 82, 290, 10.1890/0012-9658(2001)082[0290:FMMTCD]2.0.CO;2

Minka, 2004, The Dirichlet-tree distribution, Paper available online at

Mosimann, 1962, On the compound multinomial distribution, the multivariate -distribution, and correlations among proportions, Biometrika, 49, 65

Navas-Molina, 2013, Advancing our understanding of the human microbiome using QIIME, Methods in Enzymology, 531, 371, 10.1016/B978-0-12-407863-5.00019-8

Price, 2010, FastTree 2–approximately maximum-likelihood trees for large alignments, PLoS ONE, 5, e9490, 10.1371/journal.pone.0009490

, 2014, R: A Language and Environment for Statistical Computing

Schloss, 2009, Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities, Applied and Environmental Microbiology, 75, 7537, 10.1128/AEM.01541-09

Schwarz, 1978, Estimating the dimension of a model, The Annals of Statistics, 6, 461, 10.1214/aos/1176344136

Spor, 2011, Unravelling the effects of the environment and host genotype on the gut microbiome, Nature Reviews Microbiology, 9, 279, 10.1038/nrmicro2540

Tam, 2007, Correlated latent semantic model for unsupervised LM adaptation, Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, IV

Wasserman, 2009, High dimensional variable selection, The Annals of Statistics, 37, 2178, 10.1214/08-AOS646

Wu, 2011, Linking long-term dietary patterns with gut microbial enterotypes, Science, 334, 105, 10.1126/science.1208344

Xia, 2013, A logistic normal multinomial regression model for microbiome compositional data analysis, Biometrics, 69, 1053, 10.1111/biom.12079

Zhao, 2009, The composite absolute penalties family for grouped and hierarchical variable selection, The Annals of Statistics, 37, 3468, 10.1214/07-AOS584