A Logistic Normal Multinomial Regression Model for Microbiome Compositional Data Analysis

Biometrics - Tập 69 Số 4 - Trang 1053-1063 - 2013
Fan Dora Xia1, Jun Chen2, Wing K. Fung1, Hongzhe Li2
1Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam, Hong Kong#TAB#
2Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania , Philadelphia, Pennsylvania 19104 , U.S.A.

Tóm tắt

SummaryChanges in human microbiome are associated with many human diseases. Next generation sequencing technologies make it possible to quantify the microbial composition without the need for laboratory cultivation. One important problem of microbiome data analysis is to identify the environmental/biological covariates that are associated with different bacterial taxa. Taxa count data in microbiome studies are often over-dispersed and include many zeros. To account for such an over-dispersion, we propose to use an additive logistic normal multinomial regression model to associate the covariates to bacterial composition. The model can naturally account for sampling variabilities and zero observations and also allow for a flexible covariance structure among the bacterial taxa. In order to select the relevant covariates and to estimate the corresponding regression coefficients, we propose a group penalized likelihood estimation method for variable selection and estimation. We develop a Monte Carlo expectation-maximization algorithm to implement the penalized likelihood estimation. Our simulation results show that the proposed method outperforms the group penalized multinomial logistic regression and the Dirichlet multinomial regression models in variable selection. We demonstrate the methods using a data set that links human gut microbiome to micro-nutrients in order to identify the nutrients that are associated with the human gut microbiome enterotype.

Từ khóa


Tài liệu tham khảo

Aitchison, 1982, The statistical analysis of compositional data, Journal of the Royal Statistical Society Series B, 44, 139, 10.1111/j.2517-6161.1982.tb01195.x

Aitchison, 1986

Arumugam, 2011, Enterotypes of the human gut microbiome, Nature, 4, 550

Billheimer, 2001, Statistical interpretation of species composition, Journal of the American Statistical Association, 96, 1205, 10.1198/016214501753381850

Caporaso, 2010, Qiime allows analysis of high-throughput community sequencing data, Nature Methods, 7, 335, 10.1038/nmeth.f.303

Chaffron, 2010, A global network of coexisting microbes from environmental and whole-genome sequence data, Genome Research, 20, 947, 10.1101/gr.104521.109

Chen, 2013, Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis, Annals of Applied Statistics, 7, 418, 10.1214/12-AOAS592

Claesson, 2012, Gut microbiota composition correlates with diet and health in the elderly, Nature, 11319

Cole, 2009, The ribosomal database project: Improved alignments and new tools for rrna analysis, Nucleic Acids Research, 37, 141, 10.1093/nar/gkn879

Kuczynski, 2012, Experimental and analytical tools for studying the human microbiome, Nature Review Genetics, 13, 47, 10.1038/nrg3129

Meier, 2008, The group lasso for logistic regression, Journal of The Royal Statistical Society Series B, 70, 53, 10.1111/j.1467-9868.2007.00627.x

Peng, 2010, Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer, Annals of Applied Statistics, 4, 53, 10.1214/09-AOAS271

The Human Microbiome Project Consortium, 2012, Structure, function and diversity of the healthy human microbiome, Nature, 486, 207, 10.1038/nature11234

Virgin, 2011, Metagenomics and personalized medicine, Cell, 147, 44, 10.1016/j.cell.2011.09.009

Wu, 2011, Linking long-term dietary patterns with gut microbial enterotypes, Science, 334, 105, 10.1126/science.1208344

Yuan, 2006, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society, Series B, 68, 49, 10.1111/j.1467-9868.2005.00532.x