Differential expression analysis for sequence count data

Genome Biology - Tập 11 - Trang 1-12 - 2010

Simon Anders¹, Wolfgang Huber¹

¹European Molecular Biology Laboratory, Heidelberg, Germany

Tóm tắt

High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package.

Tài liệu tham khảo

Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008, 320: 1344–1349. 10.1126/science.1158441. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5: 621–628. 10.1038/nmeth.1226. Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A, Thiessen N, Griffith OL, He A, Marra M, Snyder M, Jones S: Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods. 2007, 4: 651–657. 10.1038/nmeth1068. Licatalosi DD, Mele A, Fak JJ, Ule J, Kayikci M, Chi SW, Clark TA, Schweitzer AC, Blume JE, Wang X, Darnell JC, Darnell RB: HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature. 2008, 456: 464–469. 10.1038/nature07488. Smith AM, Heisler LE, Mellor J, Kaper F, Thompson MJ, Chee M, Roth FP, Giaever G, Nislow C: Quantitative phenotyping via deep barcode sequencing. Genome Res. 2009, 19: 1836–1842. 10.1101/gr.093955.109. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008, 18: 1509–1517. 10.1101/gr.079558.108. Wang L, Feng Z, Wang X, Wang X, Zhang X: DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics. 2010, 26: 136–138. 10.1093/bioinformatics/btp612. Robinson MD, Smyth GK: Moderated statistical tests for assessing differences in tag abundance. Bioinformatics. 2007, 23 (21): 2881–2887. 10.1093/bioinformatics/btm453. Whitaker L: On the Poisson law of small numbers. Biometrika. 1914, 10: 36–71. 10.1093/biomet/10.1.36. Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010, 26: 139–140. 10.1093/bioinformatics/btp616. Robinson MD, Smyth GK: Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics. 2008, 9: 321–332. 10.1093/biostatistics/kxm030. Cameron AC, Trivedi PK: Regression Analysis of Count Data. 1998, Cambridge University Press Robinson MD, Oshlack A: A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010, 11: R25-10.1186/gb-2010-11-3-r25. Loader C: Local Regression and Likelihood. 1999, Springer McCullagh P, Nelder JA: Generalized Linear Models. 1989, Chapman & Hall/CRC, 2 locfit: Local regression, likelihood and density estimation. [https://doi.org/cran.r-project.org/web/packages/locfit/] Agresti A: Categorical Data Analysis. 2002, Wiley, 2 Engström P, Tommei D, Stricker S, Smith A, Pollard S, Bertone P: Transcriptional characterization of glioblastoma stem cell lines using tag sequencing. 2010 Morrissy AS, Morin RD, Delaney A, Zeng T, McDonald H, Jones S, Zhao Y, Hirst M, Marra MA: Next-generation tag sequencing for cancer gene expression profiling. Genome Res. 2009, 19: 1825–1835. 10.1101/gr.094482.109. Kasowski M, Grubert F, Heffelfinger C, Hariharan M, Asabere A, Waszak SM, Habegger L, Rozowsky J, Shi M, Urban AE, Hong MY, Karczewski KJ, Huber W, Weissman SM, Gerstein MB, Korbel JO, Snyder M: Variation in transcription factor binding among humans. Science. 2010, 328: 232–235. 10.1126/science.1183621. Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B. 1995, 57: 289–300. Bullard J, Purdom E, Hansen K, Dudoit S: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics. 2010, 11: 94-10.1186/1471-2105-11-94. Bloom JS, Khan Z, Kruglyak L, Singh M, Caudy AA: Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays. BMC Genomics. 2009, 10: 221-10.1186/1471-2164-10-221. Smyth GK: Limma: linear models for microarray data. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Edited by: Gentleman R, Carey V, Dudoit S, R Irizarry WH. 2005, New York: Springer, 397–420. full_text. Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3: Article3- Lönnstedt I, Speed T: Replicated microarray data. Stat Sin. 2002, 12: 31–46. R: A Language and Environment for Statistical Computing. [https://doi.org/www.R-project.org] Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J: Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80. Bliss CI, Fisher RA: Fitting the negative binomial distribution to biological data. Biometrics. 1953, 9: 176–200. 10.2307/3001850. Clark SJ, Perry JN: Estimation of the negative binomial parameter κ by maximum quasi-likelihood. Biometrics. 1989, 45: 309–316. 10.2307/2532055. Lawless JF: Negative binomial and mixed Poisson regression. Can J Stat. 1987, 15: 209–225. 10.2307/3314912. Saha K, Paul S: Bias-corrected maximum likelihood estimator of the negative binomial dispersion parameter. Biometrics. 2005, 61: 179–285. 10.1111/j.0006-341X.2005.030833.x. Fast and accurate computation of binomial probabilities. (Note: This is a copy of the original paper, which is no longer available online.), [https://doi.org/projects.scipy.org/scipy/raw-attachment/ticket/620/loader2000Fast.pdf] Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10: R25-10.1186/gb-2009-10-3-r25. HTSeq: Analysing high-throughput sequencing data with Python. [https://doi.org/www-huber.embl.de/users/anders/HTSeq/] DESeq. [https://doi.org/www-huber.embl.de/users/anders/DESeq]

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA