A small‐sample kernel association test for correlated data with application to microbiome association studies

Genetic Epidemiology - Tập 42 Số 8 - Trang 772-782 - 2018
Xiang Zhan1, Lingzhou Xue2, Haotian Zheng3, Anna Plantinga4, Michael C. Wu4,5, Daniel J. Schaid6, Ni Zhao7, Jun Chen6
1Department of Public Health Sciences, Pennsylvania State University, Hershey, Pennsylvania
2Department of Statistics, Pennsylvania State University, University Park, Pennsylvania
3Department of Mathematical Sciences, Tsinghua University, Beijing, China
4Department of Biostatistics, University of Washington, Seattle, Washington
5Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington
6Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
7Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland

Tóm tắt

AbstractRecent research has highlighted the importance of the human microbiome in many human disease and health conditions. Most current microbiome association analyses focus on unrelated samples; such methods are not appropriate for analysis of data collected from more advanced study designs such as longitudinal and pedigree studies, where outcomes can be correlated. Ignoring such correlations can sometimes lead to suboptimal results or even possibly biased conclusions. Thus, new methods to handle correlated outcome data in microbiome association studies are needed. In this paper, we propose the correlated sequence kernel association test (CSKAT) to address such correlations using the linear mixed model. Specifically, random effects are used to account for the outcome correlations and a variance component test is used to examine the microbiome effect. Compared to existing genetic association tests for longitudinal and family samples, we implement a correction procedure to better calibrate the null distribution of the score test statistic to accommodate the small sample size nature of data collected from a typical microbiome study. Comprehensive simulation studies are conducted to demonstrate the validity and efficiency of our method, and we show that CSKAT achieves a higher power than existing methods while correctly controlling the Type I error rate. We also apply our method to a microbiome data set collected from a UK twin study to illustrate its potential usefulness. A free implementation of our method in R software is available at https://github.com/jchen1981/SSKAT.

Từ khóa


Tài liệu tham khảo

10.18637/jss.v067.i01

10.1111/1574-6968.12053

10.1038/nmeth.f.303

10.1371/journal.pone.0015216

10.1093/bioinformatics/btw308

10.1002/gepi.21703

10.1093/bioinformatics/bts342

10.1002/gepi.21934

10.1016/j.csda.2009.11.025

10.1016/j.cell.2014.09.053

10.1186/s40168-017-0262-x

10.1038/nrmicro2857

10.1126/science.1254529

10.1146/annurev-statistics-010814-020351

10.1890/0012-9658(2001)082[0290:FMMTCD]2.0.CO;2

10.1191/1471082X05st084oa

10.1097/GME.0000000000000904

10.1186/s13059-015-0637-x

10.1186/s40168-016-0183-0

10.1186/s40168-017-0239-9

10.1038/nature11450

10.1002/gepi.21727

10.1002/gepi.21676

10.1534/genetics.117.300395

10.1093/bioinformatics/btw311

Tang Z.‐Z., 2017, A general framework for association analysis of microbial communities on a taxonomic tree, Bioinformatics, 33, 1278, 10.1093/bioinformatics/btw804

10.1038/nature07540

10.1038/nature06244

10.1016/j.cell.2011.09.009

10.1038/ismej.2011.109

10.1002/gepi.22016

10.1186/s13073-016-0302-3

10.1016/j.ajhg.2011.05.029

10.1093/bioinformatics/btx311

Zhan X., 2016, A novel copy number variants kernel association test with application to autism spectrum disorders studies, Bioinformatics, 32, 3603, 10.1093/bioinformatics/btw500

10.1111/biom.12684

10.1002/gepi.22030

10.1002/gepi.22065

10.1016/j.ajhg.2015.04.003

10.1002/gepi.22127

10.1186/s12859-018-2057-x