A GLM-based Latent Variable Ordination Method for Microbiome Samples

Biometrics - Tập 74 Số 2 - Trang 448-457 - 2018
Michael B. Sohn1, Hongzhe Li1
1Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, U.S.A.

Tóm tắt

Summary Distance-based ordination methods, such as principal coordinates analysis (PCoA), are widely used in the analysis of microbiome data. However, these methods are prone to pose a potential risk of misinterpretation about the compositional difference in samples across different populations if there is a difference in dispersion effects. Accounting for high sparsity and overdispersion of microbiome data, we propose a GLM-based Ordination Method for Microbiome Samples (GOMMS) in this article. This method uses a zero-inflated quasi–Poisson (ZIQP) latent factor model. An EM algorithm based on the quasi-likelihood is developed to estimate parameters. It performs comparatively to the distance-based approach when dispersion effects are negligible and consistently better when dispersion effects are strong, where the distance-based approach sometimes yields undesirable results. The estimated latent factors from GOMMS can be used to associate the microbiome community with covariates or outcomes using the standard multivariate tests, which can be investigated in future confirmatory experiments. We illustrate the method in simulations and an analysis of microbiome samples from nasopharynx and oropharynx.

Từ khóa


Tài liệu tham khảo

Bäckhed, 2015, Dynamics and stabilization of the human gut microbiome during the first eear of life, Cell Host and Microbe, 17, 690, 10.1016/j.chom.2015.04.004

Bray, 1957, An ordination of the upland forest communities of southern Wisconsin, Ecological Monographs, 27, 325, 10.2307/1942268

Caporaso, 2010, Qiime allows analysis of high-throughput community sequencing data, Nature Methods, 7, 335, 10.1038/nmeth.f.303

Charlson, 2010, Disordered microbial communities in the upper respiratory tract of cigarette smokers, PLoS One, 5, 10.1371/journal.pone.0015216

Chen, 2013, Variable selection for sparse dirichlet-multinomial regression with an application to microbiome data analysis, Annals of Applied Statistics, 7, 418, 10.1214/12-AOAS592

Finegold, 2010, Pyrosequencing study of fecal microflora of autistic and control children, Anaerobe, 16, 444, 10.1016/j.anaerobe.2010.06.008

Holter, 2000, Fundamental patterns underlying gene expression profiles: Simplicity from complexity, Proceedings of the National Academy of Sciences, 97, 8409, 10.1073/pnas.150242097

Jolliffe, 2002, Principal Component Analysis, 2nd ed.

Lee, 2013, Poisson factor models with applications to non-normalized microRNA profiling, Bioinformatics, 29, 1105, 10.1093/bioinformatics/btt091

Legendre, 1998, Numerical Ecology: Developments in Environmental Modelling.

McCullagh, 1989, Generalized Linear Models, 2nd ed., 10.1007/978-1-4899-3242-6

McMurdie, 2014, Waste not, want not: Why rarefying microbiome data is inadmissible, PLOS Computational Biology, 10, 10.1371/journal.pcbi.1003531

Segata, 2012, Metagenomic microbial community profiling using unique clade-specific marker genes, Nature Methods, 8, 811, 10.1038/nmeth.2066

Shen, 2008, Forecasting time series of inhomogeneous poisson processes with application to call center workforce management, The Annals of Applied Statistics, 2, 601, 10.1214/08-AOAS164

Sunagawa, 2013, Metagenomic species profiling using universal phylogenetic marker genes, Nature Methods, 10, 1196, 10.1038/nmeth.2693

The Human Microbiome Project Consortium, 2012, Structure, function and diversity of the healthy human microbiome, Nature, 486, 207, 10.1038/nature11234

Warton, 2012, Distance-based multivariate analyses confound location and dispersion effects, Methods in Ecology and Evolution, 3, 89, 10.1111/j.2041-210X.2011.00127.x

Zeller, 2014, Potential of fecal microbiota for early-stage detection of colorectal cancer, Molecular Systems Biology, 10, 10.15252/msb.20145645