An adaptive independence test for microbiome community data

Biometrics - Tập 76 Số 2 - Trang 414-426 - 2020
Yaru Song1,2, Hongyu Zhao3,2, Tao Wang1,4,2
1Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, Shanghai, China
2SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai, China
3Department of Biostatistics, Yale University, New Haven, Connecticut
4MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai, China

Tóm tắt

AbstractAdvances in sequencing technologies and bioinformatics tools have vastly improved our ability to collect and analyze data from complex microbial communities. A major goal of microbiome studies is to correlate the overall microbiome composition with clinical or environmental variables. La Rosa et al. recently proposed a parametric test for comparing microbiome populations between two or more groups of subjects. However, this method is not applicable for testing the association between the community composition and a continuous variable. Although multivariate nonparametric methods based on permutations are widely used in ecology studies, they lack interpretability and can be inefficient for analyzing microbiome data. We consider the problem of testing for independence between the microbial community composition and a continuous or many‐valued variable. By partitioning the range of the variable into a few slices, we formulate the problem as a problem of comparing multiple groups of microbiome samples, with each group indexed by a slice. To model multivariate and over‐dispersed count data, we use the Dirichlet‐multinomial distribution. We propose an adaptive likelihood‐ratio test by learning a good partition or slicing scheme from the data. A dynamic programming algorithm is developed for numerical optimization. We demonstrate the superiority of the proposed test by numerically comparing it with that of La Rosa et al. and other popular approaches on the same topic including PERMANOVA, the distance covariance test, and the microbiome regression‐based kernel association test. We further apply it to test the association of gut microbiome with age in three geographically distinct populations and show how the learned partition facilitates differential abundance analysis.

Từ khóa


Tài liệu tham khảo

10.1111/j.2517-6161.1995.tb02031.x

10.3390/d5030627

10.1198/016214501753381850

10.2307/1942268

10.1038/nmeth.f.303

10.1038/nrg3182

10.1080/01621459.1969.10500963

10.1186/s12263-017-0566-2

10.1073/pnas.74.10.4537

10.1371/journal.pone.0030126

10.1080/01621459.2014.920257

10.1016/j.chom.2011.09.003

10.1186/s40168-017-0262-x

10.1371/journal.pone.0052078

10.1146/annurev-statistics-010814-020351

10.1080/01621459.1991.10475035

10.1186/s13059-014-0550-8

10.1128/AEM.71.12.8228-8235.2005

Magurran A.E., 2004, Measuring Biological Diversity

10.1890/0012-9658(2001)082[0290:FMMTCD]2.0.CO;2

Mosimann J.E., 1962, On the compound multinomial distribution, the multivariate β‐distribution and correlations among proportions, Biometrika, 49, 65

10.1016/j.cell.2016.08.007

10.1128/JB.188.4.1260-1265.2006

10.1038/nature25973

10.1128/AEM.71.3.1501-1506.2005

10.1214/09-AOAS312

10.1038/nm.4142

10.1093/biostatistics/kxy025

10.1093/bioinformatics/btw311

Tang Z.‐Z., 2017, A general framework for association analysis of microbial communities on a taxonomic tree, Bioinformatics, 33, 1278, 10.1093/bioinformatics/btw804

10.1016/j.tpb.2010.07.002

10.1111/biom.12654

10.1146/annurev.genet.36.050802.093940

10.1186/s13073-016-0302-3

10.1126/science.1208344

10.1038/nature11053

10.1016/j.ajhg.2015.04.003

10.1126/science.aao5774