A latent allocation model for the analysis of microbial composition and disease

BMC Bioinformatics - Tập 19 - Trang 171-177 - 2018
Ko Abe1, Masaaki Hirayama2, Kinji Ohno3, Teppei Shimamura1
1Division of Systems Biology, Nagoya university Graduate School of Medicine, Nagoya, Japan
2School of Health Sciences, Nagoya University Graduate School of Medicine, Nagoya, Japan
3Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, Nagoya, Japan

Tóm tắt

Establishing the relationship between microbiota and specific diseases is important but requires appropriate statistical methodology. A specialized feature of microbiome count data is the presence of a large number of zeros, which makes it difficult to analyze in case-control studies. Most existing approaches either add a small number called a pseudo-count or use probability models such as the multinomial and Dirichlet-multinomial distributions to explain the excess zero counts, which may produce unnecessary biases and impose a correlation structure taht is unsuitable for microbiome data. The purpose of this article is to develop a new probabilistic model, called BERnoulli and MUltinomial Distribution-based latent Allocation (BERMUDA), to address these problems. BERMUDA enables us to describe the differences in bacteria composition and a certain disease among samples. We also provide a simple and efficient learning procedure for the proposed model using an annealing EM algorithm. We illustrate the performance of the proposed method both through both the simulation and real data analysis. BERMUDA is implemented with R and is available from GitHub ( https://github.com/abikoushi/Bermuda ).

Tài liệu tham khảo

Brooks JP. Challenges for case-control studies with microbiome data. Ann Epidemiol. 2016; 26(5):336–41. Xia F, Chen J, Fung WK, Li H. A logistic normal multinomial regression model for microbiome compositional data analysi. Biometrics. 2013; 69(4):1053–63. WEISS S, et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome. 2017; 5.1:27. Wagner BD, Robertson CE, Harris JK. Application of two-part statistics for comparison of sequence variant counts. PloS ONE. 2011; 6.5:e20296. Chen EZ, Li H. A two-part mixed-effects model for analyzing longitudinal microbiome compositional data. Bioinformatics. 2016; 32(17):2611–7. Paulson JN, Stine OC, Bravo HC, Pop M. Differential abundance analysis for microbial marker-gene surveys. Nat Methods. 2013; 10(12):1200–2. Ueda N, Nakano R. Deterministic annealing EM algorithm, Neural Networks. Adv Neural Inf Process Syst. 1998; 11(2):271–282. Scheperjans F, et al. Gut microbiota are related to Parkinson’s disease and clinical phenotype. Mov Disord. 2015; 30(3):350–8. Hill-Burns EM, et al. Parkinson’s disease and Parkinson’s disease medications have distinct signatures of the gut microbiome. Mov Disord. 2017; 32(5):739–49. Hopfner F, et al. Gut microbiota in Parkinson disease in a northern German cohort. Brain Res. 2017; 1667:41–5. Heintz-Buschart A, et al. The nasal and gut microbiome in Parkinson’s disease and idiopathic rapid eye movement sleep behavior disorder. Mov Disord. 2018; 33(1):88–98. ZELLER G, et al. Potential of fecal microbiota for early- stage detection of colorectal cancer. Mol Syst Biol. 2014; 10.11:766. ZHU Q, et al. The role of gut microbiota in the pathogenesis of colorectal cancer. Tumor Biol. 2013; 34.3:1285–300. Petrov VA, et al. Analysis of Gut Microbiota in Patients with Parkinson’s Disease. Bull Exp Biol Med. 2016; 162(6):734–7.