GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data

PeerJ - Tập 6 - Trang e4600

Li Chen¹, James Reeve², Lujun Zhang³, Nancy Y. Ip², Xuefeng Wang⁴, Jun Chen⁵

¹Department of Health Outcomes Research and Policy, Harrison School of Pharmacy, Auburn University, Auburn, AL, USA

²Bioinformatics and Computational Biology Program, University of Minnesota—Rochester, Rochester, MN, USA

³College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, Zhejiang, China

⁴Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA

⁵Division of Biomedical Statistics and Informatics and Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA

Tóm tắt

Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero-inflation remain largely undeveloped. Here we propose geometric mean of pairwise ratios—a simple but effective normalization method—for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.

Từ khóa

Tài liệu tham khảo

Aird, 2011, Analyzing and minimizing PCR amplification bias in illumina sequencing libraries, Genome Biology, 12, R18, 10.1186/gb-2011-12-2-r18

Anders, 2010, Differential expression analysis for sequence count data, Genome Biology, 11, R106, 10.1186/gb-2010-11-10-r106

Caporaso, 2010, QIIME allows analysis of high-throughput community sequencing data, Nature Methods, 7, 335, 10.1038/nmeth.f.303

Chen, 2012, Associating microbiome composition with environmental covariates using generalized UniFrac distances, Bioinformatics, 28, 2106, 10.1093/bioinformatics/bts342

Chen, 2018, An omnibus test for differential distribution analysis of microbiome sequencing data, Bioinformatics, 34, 643, 10.1093/bioinformatics/btx650

Chen, 2013, Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis, Annals of Applied Statistics, 7, 418, 10.1214/12-aoas592

Costea, 2014, A fair comparison, Nature Methods, 11, 359, 10.1038/nmeth.2897

Dillies, 2013, A comprehensive evaluation of normalization methods for illumina high-throughput RNA sequencing data analysis, Briefings in Bioinformatics, 14, 671, 10.1093/bib/bbs046

Fortin, 2014, Functional normalization of 450k methylation array data improves replication in large cancer studies, Genome Biology, 15, 503, 10.1186/s13059-014-0503-2

Hall, 2017, Human genetic variation and the gut microbiome in disease, Nature Reviews Genetics, 18, 690, 10.1038/nrg.2017.63

Li, 2015, Comparing the normalization methods for the differential analysis of illumina high-throughput RNA-Seq data, BMC Bioinformatics, 16, 347, 10.1186/s12859-015-0778-7

Love, 2014, Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2, Genome Biology, 15, 550, 10.1186/s13059-014-0550-8

Mandal, 2015, Analysis of composition of microbiomes: a novel method for studying microbial composition, Microbial Ecology in Health & Disease, 26, 27663, 10.3402/mehd.v26.27663

McMurdie, 2014, Waste not, want not: why rarefying microbiome data is inadmissible, PLOS Computational Biology, 10, e1003531, 10.1371/journal.pcbi.1003531

Morton, 2017, Balance trees reveal microbial niche differentiation, mSystems, 2, e0016216, 10.1128/msystems.00162-16

Paulson, 2013, Differential abundance analysis for microbial marker-gene surveys, Nature Methods, 10, 1200, 10.1038/nmeth.2658

Robinson, 2016, Intricacies of assessing the human microbiome in epidemiologic studies, Annals of Epidemiology, 26, 311, 10.1016/j.annepidem.2016.04.005

Robinson, 2010, edgeR: a bioconductor package for differential expression analysis of digital gene expression data, Bioinformatics, 26, 139, 10.1093/bioinformatics/btp616

Robinson, 2010, A scaling normalization method for differential expression analysis of RNA-Seq data, Genome Biology, 11, R25, 10.1186/gb-2010-11-3-r25

Sinha, 2016, Collecting fecal samples for microbiome analyses in epidemiology studies, Cancer Epidemiology Biomarkers & Prevention, 25, 407, 10.1158/1055-9965.epi-15-0951

Thorsen, 2016, Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16s rRNA gene amplicon data analysis methods used in microbiome studies, Microbiome, 4, 62, 10.1186/s40168-016-0208-8

Tsilimigras, 2016, Compositional data analysis of the microbiome: fundamentals, tools, and challenges, Annals of Epidemiology, 26, 330, 10.1016/j.annepidem.2016.03.002

Vallejos, 2017, Normalizing single-cell RNA sequencing data: challenges and opportunities, Nature Methods, 14, 565, 10.1038/nmeth.4292

Wang, 2009, RNA-Seq: a revolutionary tool for transcriptomics, Nature Reviews Genetics, 10, 57, 10.1038/nrg2484

Weiss, 2017, Normalization and microbial differential abundance strategies depend upon data characteristics, Microbiome, 5, 27, 10.1186/s40168-017-0237-y

Wu, 2011, Linking long-term dietary patterns with gut microbial enterotypes, Science, 334, 105, 10.1126/science.1208344

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA