Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Michael I. Love1, Wolfgang Huber2, Simon Anders2
1Department of Biostatistics and Computational Biology, Dana Farber Cancer Institute and Department of Biostatistics, Harvard School of Public Health, 450 Brookline Avenue, Boston, 02215, MA, USA
2Genome Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, Heidelberg, 69117, Germany

Tóm tắt

Từ khóa


Tài liệu tham khảo

Lönnstedt I, Speed T: Replicated microarray data . Stat Sinica. 2002, 12: 31-46.

Robinson MD, Smyth GK: Moderated statistical tests for assessing differences in tag abundance . Bioinformatics. 2007, 23: 2881-2887. 10.1093/bioinformatics/btm453.

McCarthy DJ, Chen Y, Smyth GK: Differential expression analysis of multifactor RNA-seq experiments with respect to biological variation . Nucleic Acids Res. 2012, 40: 4288-4297. 10.1093/nar/gks042.

Anders S, Huber W: Differential expression analysis for sequence count data . Genome Biol. 2010, 11: 106-10.1186/gb-2010-11-10-r106.

Zhou Y-H, Xia K, Wright FA: A powerful and flexible approach to the analysis of RNA sequence count data . Bioinformatics. 2011, 27: 2672-2678. 10.1093/bioinformatics/btr449.

Wu H, Wang C, Wu Z: A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data . Biostatistics. 2013, 14: 232-243. 10.1093/biostatistics/kxs033.

Hardcastle T, Kelly K: baySeq: empirical Bayesian methods for identifying differential expression in sequence count data . BMC Bioinformatics. 2010, 11: 422-10.1186/1471-2105-11-422.

Van De Wiel MA, Leday GGR, Pardo L, Rue H, Van Der Vaart AW, Van Wieringen WN: Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors . Biostatistics. 2013, 14: 113-128. 10.1093/biostatistics/kxs031.

Boer JM, Huber WK, Sültmann H, Wilmer F, von Heydebreck A, Haas S, Korn B, Gunawan B, Vente A, Füzesi L, Vingron M, Poustka A: Identification and classification of differentially expressed genes in renal cell carcinoma by expression profiling on a global human 31,500-element cDNA array . Genome Res. 2001, 11: 1861-1870.

DESeq2. [ http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html ]

Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J: Bioconductor: open software development for computational biology and bioinformatics . Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.

McCullagh P, Nelder JA: Generalized linear models . Monographs on Statistics & Applied Probability . 1989, Chapman & Hall/CRC, London, UK,

Hansen KD, Irizarry RA, Wu Z: Removing technical variability in RNA-seq data using conditional quantile normalization . Biostatistics. 2012, 13: 204-216. 10.1093/biostatistics/kxr054.

Risso D, Schwartz K, Sherlock G, Dudoit S: GC-content normalization for RNA-seq data . BMC Bioinformatics. 2011, 12: 480-10.1186/1471-2105-12-480.

Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments . Stat Appl Genet Mol Biol. 2004, 3: 1-25.

Bottomly D, Walter NAR, Hunter JE, Darakjian P, Kawane S, Buck KJ, Searles RP, Mooney M, McWeeney SK, Hitzemann R: Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum using RNA-seq and microarrays . PLoS ONE. 2011, 6: 17820-10.1371/journal.pone.0017820.

Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras J-B, Stephens M, Gilad Y, Pritchard JK: Understanding mechanisms underlying human gene expression variation with RNA sequencing . Nature. 2010, 464: 768-772. 10.1038/nature08872.

Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning: Data Mining, Inference, and Prediction . 2009, Springer, New York City, USA

Bi Y, Davuluri R: NPEBseq: nonparametric empirical Bayesian-based procedure for differential expression analysis of RNA-seq data . BMC Bioinformatics. 2013, 14: 262-10.1186/1471-2105-14-262.

Feng J, Meyer CA, Wang Q, Liu JS, Liu XS, Zhang Y: GFOLD: a generalized fold change for ranking differentially expressed genes from RNA-seq data . Bioinformatics. 2012, 28: 2782-2788. 10.1093/bioinformatics/bts515.

Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing . J R Stat Soc Ser B Methodol. 1995, 57: 289-300.

Bourgon R, Gentleman R, Huber W: Independent filtering increases detection power for high-throughput experiments . Proc Natl Acad Sci USA. 2010, 107: 9546-9551. 10.1073/pnas.0914005107.

McCarthy DJ, Smyth GK: Testing significance relative to a fold-change threshold is a TREAT . Bioinformatics. 2009, 25: 765-771. 10.1093/bioinformatics/btp053.

Li J, Tibshirani R: Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-seq data . Stat Methods Med Res. 2013, 22: 519-536. 10.1177/0962280211428386.

Cook RD: Detection of influential observation in linear regression . Technometrics. 1977, 19: 15-18. 10.2307/1268249.

Hammer P, Banck MS, Amberg R, Wang C, Petznick G, Luo S, Khrebtukova I, Schroth GP, Beyerlein P, Beutler AS: mRNA-seq with agnostic splice site discovery for nervous system transcriptomics tested in chronic pain . Genome Res. 2010, 20: 847-860. 10.1101/gr.101204.109.

Frazee A, Langmead B, Leek J: ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets . BMC Bioinformatics. 2011, 12: 449-10.1186/1471-2105-12-449.

Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L: Differential analysis of gene regulation at transcript resolution with RNA-seq . Nat Biotechnol. 2012, 31: 46-53. 10.1038/nbt.2450.

Glaus P, Honkela A, Rattray M: Identifying differentially expressed transcripts from RNA-seq data with biological variation . Bioinformatics. 2012, 28: 1721-1728. 10.1093/bioinformatics/bts260.

Anders S, Reyes A, Huber W: Detecting differential usage of exons from RNA-seq data . Genome Res. 2012, 22: 2008-2017. 10.1101/gr.133744.111.

Sammeth M: Complete alternative splicing events are bubbles in splicing graphs . J Comput Biol. 2009, 16: 1117-1140. 10.1089/cmb.2009.0108.

Pagès H, Bindreither D, Carlson M, Morgan M: SplicingGraphs: create, manipulate, visualize splicing graphs, and assign RNA-seq reads to them2013. Bioconductor package [ http://www.bioconductor.org ]

Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data . Bioinformatics. 2009, 26: 139-140. 10.1093/bioinformatics/btp616.

Zhou X, Lindsay H, Robinson MD: Robustly detecting differential expression in RNA sequencing data using observation weights . Nucleic Acids Res. 2014, 42: e91-10.1093/nar/gku310.

Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BMG, Haag JD, Gould MN, Stewart RM, Kendziorski C: EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments . Bioinformatics. 2013, 29: 1035-1043. 10.1093/bioinformatics/btt087.

Law CW, Chen Y, Shi W, Smyth GK: Voom: precision weights unlock linear model analysis tools for RNA-seq read counts . Genome Biol. 2014, 15: 29-10.1186/gb-2014-15-2-r29.

Hubert L, Arabie P: Comparing partitions . J Classif. 1985, 2: 193-218. 10.1007/BF01908075.

Witten DM: Classification and clustering of sequencing data using a Poisson model . Ann Appl Stat. 2011, 5: 2493-2518. 10.1214/11-AOAS493.

Irizarry RA, Wu Z, Jaffee HA: Comparison of affymetrix GeneChip expression measures . Bioinformatics. 2006, 22: 789-794. 10.1093/bioinformatics/btk046.

Asangani IA, Dommeti VL, Wang X, Malik R, Cieslik M, Yang R, Escara-Wilke J, Wilder-Romans K, Dhanireddy S, Engelke C, Iyer MK, Jing X, Wu Y-M, Cao X, Qin ZS, Wang S, Feng FY, Chinnaiyan AM: Therapeutic targeting of BET bromodomain proteins in castration-resistant prostate cancer . Nature. 2014, 510: 278-282. 10.1038/nature13229.

Stark R, Brown G: DiffBind: differential binding analysis of ChIP-seq peak data2013. Bioconductor package [ http://www.bioconductor.org ]

Ross-Innes CS, Stark R, Teschendorff AE, Holmes KA, Ali HR, Dunning MJ, Brown GD, Gojis O, Ellis IO, Green AR, Ali S, Chin S-F, Palmieri C, Caldas C, Carroll JS: Differential oestrogen receptor binding is associated with clinical outcome in breast cancer . Nature. 2012, 481: 389-393.

Robinson DG, Chen W, Storey JD, Gresham D: Design and analysis of bar-seq experiments . G3 (Bethesda). 2013, 4: 11-18. 10.1534/g3.113.008565.

McMurdie PJ, Holmes S: Waste not, want not: why rarefying microbiome data is inadmissible . PLoS Comput Biol. 2014, 10: 1003531-10.1371/journal.pcbi.1003531.

Vasquez J, Hon C, Vanselow JT, Schlosser A, Siegel TN: Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages . Nucleic Acids Res. 2014, 42: 3623-3637. 10.1093/nar/gkt1386.

Zhou Y, Zhu S, Cai C, Yuan P, Li C, Huang Y, Wei W: High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells . Nature. 2014, 509: 487-491. 10.1038/nature13166.

Cox DR, Reid N: Parameter orthogonality and approximate conditional inference . J R Stat Soc Ser B Methodol. 1987, 49: 1-39.

Robinson MD, Smyth GK: Small-sample estimation of negative binomial dispersion, with applications to SAGE data . Biostatistics. 2007, 9: 321-332. 10.1093/biostatistics/kxm030.

Pawitan Y: In All Likelihood: Statistical Modelling and Inference Using Likelihood . 2001, Oxford University Press, New York City, USA

Armijo L: Minimization of functions having Lipschitz continuous first partial derivatives . Pac J Math. 1966, 16: 1-3. 10.2140/pjm.1966.16.1.

Di Y, Schafer DW, Cumbie JS, Chang JH: The NBP negative binomial model for assessing differential gene expression from RNA-seq . Stat Appl Genet Mol Biol. 2011, 10: 1-28.

Abramowitz M, Stegun I: Handbook of Mathematical Functions . 1965, Dover Publications, New York, USA

Newton M, Kendziorski C, Richmond C, Blattner F, Tsui K: On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data . J Comput Biol. 2001, 8: 37-52. 10.1089/106652701300099074.

Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression . Bioinformatics. 2002, 18: 96-104. 10.1093/bioinformatics/18.suppl_1.S96.

Durbin BP, Hardin JS, Hawkins DM, Rocke DM: A variance-stabilizing transformation for gene-expression microarray data . Bioinformatics. 2002, 18: 105-110. 10.1093/bioinformatics/18.suppl_1.S105.

Park MY: Generalized linear models with regularization. PhD thesis.Stanford University, Department of Statistics; 2006.

Friedman J, Hastie T, Tibshirani R: Regularization paths for generalized linear models via coordinate descent . J Stat Softw. 2010, 33: 1-22.

Cule E, Vineis P, De Iorio M: Significance testing in ridge regression for genetic data . BMC Bioinformatics. 2011, 12: 372-10.1186/1471-2105-12-372.

Cook RD, Weisberg S: Residuals and Influence in Regression . 1982, Chapman and Hall/CRC, New York, USA

Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, Morgan MT, Carey VJ: Software for computing and annotating genomic ranges . PLoS Comput Biol. 2013, 9: 1003118-10.1371/journal.pcbi.1003118.

Pagès H, Obenchain V, Morgan M: GenomicAlignments: Representation and manipulation of short genomic alignments2013. Bioconductor package [ http://www.bioconductor.org ]

Anders S, Pyl PT, Huber W: HTSeq - A Python framework to work with high-throughput sequencing data . Bioinformatics. 2015, 31: 166-10.1093/bioinformatics/btu638.

Delhomme N, Padioleau I, Furlong EE, Steinmetz LM: easyRNASeq: a Bioconductor package for processing RNA-seq data . Bioinformatics. 2012, 28: 2532-2533. 10.1093/bioinformatics/bts477.

Liao Y, Smyth GK, Shi W: featureCounts: an efficient general purpose program for assigning sequence reads to genomic features . Bioinformatics. 2014, 30: 923-930. 10.1093/bioinformatics/btt656.

Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg S: TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions . Genome Biol. 2013, 14: 36-10.1186/gb-2013-14-4-r36.

DESeq2paper. [ http://www-huber.embl.de/DESeq2paper ]