Dirichlet‐multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data

Molecular Ecology Resources - Tập 20 Số 2 - Trang 481-497 - 2020
Joshua G. Harrison1, W. John Calder1, Vivaswat Shastry1, C. Alex Buerkle1
1Department of Botany, University of Wyoming, Laramie, WY, USA

Tóm tắt

AbstractMolecular ecology regularly requires the analysis of count data that reflect the relative abundance of features of a composition (e.g., taxa in a community, gene transcripts in a tissue). The sampling process that generates these data can be modelled using the multinomial distribution. Replicate multinomial samples inform the relative abundances of features in an underlying Dirichlet distribution. These distributions together form a hierarchical model for relative abundances among replicates and sampling groups. This type of Dirichlet‐multinomial modelling (DMM) has been described previously, but its benefits and limitations are largely untested. With simulated data, we quantified the ability of DMM to detect differences in proportions between treatment and control groups, and compared the efficacy of three computational methods to implement DMM—Hamiltonian Monte Carlo (HMC), variational inference (VI), and Gibbs Markov chain Monte Carlo. We report that DMM was better able to detect shifts in relative abundances than analogous analytical tools, while identifying an acceptably low number of false positives. Among methods for implementing DMM, HMC provided the most accurate estimates of relative abundances, and VI was the most computationally efficient. The sensitivity of DMM was exemplified through analysis of previously published data describing lung microbiomes. We report that DMM identified several potentially pathogenic, bacterial taxa as more abundant in the lungs of children who aspirated foreign material during swallowing; these differences went undetected with different statistical approaches. Our results suggest that DMM has strong potential as a statistical method to guide inference in molecular ecology.

Từ khóa


Tài liệu tham khảo

Aitchison J., 1982, The statistical analysis of compositional data

10.1007/s11004-005-7383-7

10.1186/s12890-016-0235-z

10.18637/jss.v067.i01

10.1016/j.trsl.2012.02.005

10.1111/mec.14718

10.1080/01621459.2017.1285773

10.1002/(SICI)1097-4571(199401)45:1<12::AID-ASI2>3.0.CO;2-L

10.1186/1471-2105-11-94

10.1214/12-BA703

10.18637/jss.v076.i01

10.1214/12-AOAS592

10.3109/00365549409008611

10.1002/ecy.1802

10.1093/bib/bbs046

10.1371/journal.pone.0216453

10.1016/j.tig.2013.05.010

10.1164/rccm.200212-1543OC

10.1186/2049-2618-2-15

10.1371/journal.pone.0026785

10.1371/journal.pcbi.1002687

10.1201/b16018

10.1214/ss/1177011136

10.1016/B978-0-08-051581-6.50057-X

10.21034/sr.148

Gloor G. B., 2017, Microbiome datasets are compositional: And this is not optional, Frontiers in Microbiology, 8

10.1139/cjm-2015-0821

Grantham N. S. Guan Y. Reich B. J. Borer E. T. &Gross K.. (2019).MIMIX: a Bayesian mixed‐effects model for microbiome data from designed experiments. Journal of the American Statistical Association.https://doi.org/10.1080/01621459.2019.1626242

Harrison J. G. Calder W. J. Shastry V. &Buerkle C. A.(2019).Scripts from ‘Dirichlet multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data’.https://doi.org/10.5281/zenodo.3558682. Zenodo.

Hoffman M. D., 2014, The no‐U‐turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo, Journal of Machine Learning Research, 15, 1593

10.1001/archneur.1994.00540220099020

10.1371/journal.pone.0030126

Jacobs N. M., 1979, Acute Haemophilus pneumonia in childhood, American Journal of Diseases of Children, 133, 603

10.1101/gr.121095.111

10.1128/JCM.13.6.1066-1069.1981

10.1093/bioinformatics/bth917

10.1038/s41579-018-0029-9

10.1038/nmeth.1650

10.1201/9781420011371

Kruschke J., 2015, Doing Bayesian data analysis: A tutorial with R, jags, and stan

Kucukelbir A., 2015, Advances in neural information processing systems, 568

10.1186/1471-2105-11-58

10.1186/s13059-014-0550-8

10.1201/b13613

10.1038/nrmicro3400

Mandal S., 2015, Analysis of composition of microbiomes: A novel method for studying microbial composition, Microbial Ecology in Health and Disease, 26, 27663

10.1056/NEJM200103013440908

10.1002/ecy.2174

10.1016/0005-2795(75)90109-9

10.1111/2041-210X.13115

10.1371/journal.pcbi.1003531

10.1111/2041-210X.12681

10.1038/s41467-019-10656-5

10.2307/2333468

10.1038/ncomms6125

10.1371/journal.pone.0216129

10.12688/f1000research.8900.2

10.1038/nmeth.2658

10.1144/GSL.SP.2006.264.01.01

10.1098/rspl.1896.0076

Plummer M., 2003, Proceedings of the 3rd international workshop on distributed statistical computing, 10

Plummer M.(2015).rjags: Bayesian graphical models using MCMC. R package version 3‐15.https://CRAN.R-project.org/package=rjags

Quinn T. P., 2018, Understanding sequencing data as compositions: An outlook and review, bioRxiv, 34, 2870

R Core Team, 2019, R: A language and environment for statistical computing

10.1534/genetics.114.164350

10.1093/bioinformatics/btp616

10.1371/journal.pone.0052078

Sachdeva R., 2019, Rare microbes from diverse earth biomes dominate community activity, bioRxiv, 636373

10.7717/peerj-cs.55

10.1111/mec.14276

10.1186/s40168-015-0073-x

Shenhav L., 2019, FEAST: Fast expectation‐maximization for microbial source tracking, Nature Methods, 1

Stan Development Team. (2018).rstan: the R interface tostan. R package version 2.17.3.http://mc-stan.org

Tang Z.‐Z., 2018, Zero‐inflated generalized Dirichlet multinomial regression model for microbiome compositional data analysis, Biostatistics, 00, 1

10.1542/peds.2015-1612

10.1186/s40168-016-0208-8

10.1186/s40168-018-0491-7

Tourlousse D. M., 2017, Synthetic spike‐in standards for high‐throughput 16s rRNA gene amplicon sequencing, Nucleic Acids Research, 45, e23

10.1016/j.annepidem.2016.03.002

10.1007/978-3-642-36809-7

Wang Y. Naumann U. Eddelbuettel D. Wilshire J. Warton D. Byrnes J. …Wright S.(2019).mvabund: statistical methods for analysing multivariate abundance data. R package version 4.0.1.https://CRAN.R-project.org/package=mvabund

10.1038/ismej.2015.235

10.1186/s40168-017-0237-y

10.1038/s41467-019-10253-6

10.1016/j.cels.2016.12.012