An Adaptive Multivariate Two-Sample Test With Application to Microbiome Differential Abundance Analysis

Kalins Banerjee1, Ni Zhao2, A. Srinivasan3, Lingzhou Xue3, Steven D. Hicks4, Frank A. Middleton5, Rongling Wu1, Xiang Zhan1
1Department of Public Health Sciences, Pennsylvania State University, Hershey, PA, United States
2Department of Biostatistics, Johns Hopkins University, Baltimore, MD, United States
3Department of Statistics, Pennsylvania State University, University Park, PA, United States
4Department of Pediatrics, Pennsylvania State University, Hershey, PA, United States
5Department of Neuroscience, State University of New York Upstate Medical University, Syracuse, NY, United States

Tóm tắt

Từ khóa


Tài liệu tham khảo

Ainsworth, 2017, k-slam: accurate and ultra-fast taxonomic classification and gene identification for large metagenomic data sets, Nucleic Acids Res., 45, 1649, 10.1093/nar/gkw1248

Aitchison, 1982, The statistical analysis of compositional data, J. R. Stat. Soc. Ser. B, 44, 139, 10.1111/j.2517-6161.1982.tb01195.x

Anders, 2010, Differential expression analysis for sequence count data, Genome Biol., 11, R106, 10.1186/gb-2010-11-10-r106

Atchison, 1980, Logistic-normal distributions: Some properties and uses, Biometrika, 67, 261, 10.1093/biomet/67.2.261

Bai, 1996, Effect of high dimension: by an example of a two sample problem, Stat. Sin., 6, 311

Barber, 2015, Controlling the false discovery rate via knockoffs, Ann. Stat., 43, 2055, 10.1214/15-AOS1337

Benjamini, 1995, Controlling the false discovery rate: a practical and powerful approach to multiple testing, J. R. Stat. Soc. Ser., 57, 289, 10.1111/j.2517-6161.1995.tb02031.x

Benjamini, 2001, The control of the false discovery rate in multiple testing under dependency, Ann. Stat., 29, 1165, 10.1214/aos/1013699998

Cai, 2012, Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test, Biostatistics, 13, 776, 10.1093/biostatistics/kxs015

Cai, 2014, Two-sample test of high dimensional means under dependence, J. R. Stat. Soc., 76, 349, 10.1111/rssb.12034

Candes, 2018, Panning for gold: model–X knockoffs for high dimensional controlled variable selection, J. R. Stat. Soc., 80, 551, 10.1111/rssb.12265

Cao, 2017, Two-sample tests of high-dimensional means for compositional data, Biometrika, 105, 115, 10.1093/biomet/asx060

Chen, 2016, Small sample kernel association tests for human genetic and microbiome association studies, Genet. Epidemiol., 40, 5, 10.1002/gepi.21934

Chen, 2017, An omnibus test for differential distribution analysis of microbiome sequencing data, Bioinformatics, 34, 643, 10.1093/bioinformatics/btx650

Chen, 2010, A two-sample test for high-dimensional data with applications to gene-set testing, Ann. Stat., 38, 808, 10.1214/09-AOS716

Gretton, 2007, A kernel method for the two-sample problem, NIPS, 520

Gretton, 2012, A kernel two-sample test, J. Mach. Learn. Res., 13, 723

Hawinkel, 2017, A broken promise: microbiome differential abundance methods do not control the false discovery rate, Brief. Bioinform., 20, 210, 10.1093/bib/bbx104

Hicks, 2018, Oral microbiome activity in children with autism spectrum disorder, Aut. Res., 11, 1286, 10.1002/aur.1972

Koh, 2017, A powerful microbiome-based association test and a microbial taxa discovery framework for comprehensive association mapping, Microbiome, 5, 45, 10.1186/s40168-017-0262-x

Li, 2015, Microbiome, metagenomics and high-dimensional compositional data analysis, Ann. Rev. Stat. Appl., 2, 73, 10.1146/annurev-statistics-010814-020351

Louis, 2014, The gut microbiota, bacterial metabolites and colorectal cancer, Nat. Rev. Microbiol., 12, 661, 10.1038/nrmicro3344

McArdle, 2001, Fitting multivariate models to community data: a comment on distance-based redundancy analysis, Ecology, 82, 290, 10.1890/0012-9658(2001)082<0290:FMMTCD>2.0.CO;2

McMurdie, 2014, Waste not, want not: why rarefying microbiome data is inadmissible, PLoS Comp. Biol., 10, e1003531, 10.1371/journal.pcbi.1003531

Mitchell, 2017, Vaginal microbiota and genitourinary menopausal symptoms: a cross-sectional analysis, Menopause, 24, 1160, 10.1097/GME.0000000000000904

Morgan, 2015, Associations between host gene expression, the mucosal microbiome, and clinical outcome in the pelvic pouch of patients with inflammatory bowel disease, Gen. Biol., 16, 67, 10.1186/s13059-015-0637-x

Pan, 2014, A powerful and adaptive association test for rare variants, Genetics, 197, 1081, 10.1534/genetics.114.165035

Pan, 2015, A powerful pathway-based adaptive test for genetic association with common or rare variants, Am. J. Hum. Genet., 97, 86, 10.1016/j.ajhg.2015.05.018

Plantinga, 2017, Mirkat-s: a community-level test of association between the microbiota and survival times, Microbiome, 5, 17, 10.1186/s40168-017-0239-9

Qin, 2012, A metagenome-wide association study of gut microbiota in type 2 diabetes, Nature, 490, 55, 10.1038/nature11450

Robinson, 2010, edger: a bioconductor package for differential expression analysis of digital gene expression data, Bioinformatics, 26, 139, 10.1093/bioinformatics/btp616

Sohn, 2015, A robust approach for identifying differentially abundant features in metagenomic samples, Bioinformatics, 31, 2269, 10.1093/bioinformatics/btv165

Tang, 2016, Permanova-s: association test for microbial community composition that accommodates confounders and multiple distances, Bioinformatics, 32, 2618, 10.1093/bioinformatics/btw311

Tang, 2017, A general framework for association analysis of microbial communities on a taxonomic tree, Bioinformatics, 33, 1278, 10.1093/bioinformatics/btw804

Tibshirani, 1996, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser., 58, 267, 10.1111/j.2517-6161.1996.tb02080.x

Turnbaugh, 2009, A core gut microbiome in obese and lean twins, Nature, 457, 480, 10.1038/nature07540

Virgin, 2011, Metagenomics and personalized medicine, Cell, 147, 44, 10.1016/j.cell.2011.09.009

Wang, 2016, Metagenome-wide association studies: fine-mining the microbiome, Nat. Rev. Microbiol., 14, 508, 10.1038/nrmicro.2016.83

Weiss, 2017, Normalization and microbial differential abundance strategies depend upon data characteristics, Microbiome, 5, 27, 10.1186/s40168-017-0237-y

Wu, 2016, An adaptive association test for microbiome data, Gen. Med., 8, 56, 10.1186/s13073-016-0302-3

Zhan, 2015, An adaptive genetic association test using double kernel machines, Stat. Biosci., 7, 262, 10.1007/s12561-014-9116-2

Zhan, , A fast small-sample kernel independence test for microbiome community-level association analysis, Biometrics, 73, 1453, 10.1111/biom.12684

Zhan, , A small-sample multivariate kernel machine test for microbiome association studies, Gen. Epidemiol., 41, 210, 10.1002/gepi.22030

Zhan, 2018, A small-sample kernel association test for correlated data with application to microbiome association studies, Gen. Epidemiol., 42, 772, 10.1002/gepi.22160

Zhang, 2016, Zero-inflated negative binomial regression for differential abundance testing in microbiome studies, J. Bioinform. Genom., 2, 1, 10.18454/jbg.2016.2.2.1

Zhang, 2015, The oral and gut microbiomes are perturbed in rheumatoid arthritis and partly normalized after treatment, Nat. Med., 21, 895, 10.1038/nm.3914

Zhao, 2015, Testing in microbiome-profiling studies with mirkat, the microbiome regression-based kernel association test, Am. J. Hum. Gen., 96, 797, 10.1016/j.ajhg.2015.04.003

Zhao, 2018, Generalized hotelling's test for paired compositional data with application to human microbiome studies, Gen. Epidemiol., 42, 459, 10.1002/gepi.22127