RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR

F1000Research - Tập 5 - Trang 1408
Charity W. Law1,2, Monther Alhamdoosh3, Shian Su2, Gordon K. Smyth4,2, Matthew E. Ritchie1,4,2
1Department of Medical Biology, The University of Melbourne, Parkville, 3010
2The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052
3CSL Limited, Parkville, Victoria, 3010
4School of Mathematics and Statistics, The University of Melbourne, Parkville, 3010

Tóm tắt

The ability to easily and efficiently analyse RNA-sequencing data is a key strength of the Bioconductor project. Starting with counts summarised at the gene-level, a typical analysis involves pre-processing, exploratory data analysis, differential expression testing and pathway analysis with the results obtained informing future experiments and validation studies. In this workflow article, we analyse RNA-sequencing data from the mouse mammary gland, demonstrating use of the popular edgeR package to import, organise, filter and normalise the data, followed by the limma package with its voom method, linear modelling and empirical Bayes moderation to assess differential expression and perform gene set testing. This pipeline is further enhanced by the Glimma package which enables interactive exploration of the results so that individual samples and genes can be examined by the user. The complete analysis offered by these three packages highlights the ease with which researchers can turn the raw counts from an RNA-sequencing experiment into biological insights using Bioconductor.

Từ khóa


Tài liệu tham khảo

M Robinson, 2010, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data., Bioinformatics., 26, 139-140, 10.1093/bioinformatics/btp616

M Ritchie, 2015, limma powers differential expression analyses for RNA-sequencing and microarray studies., Nucleic Acids Res., 43, e47, 10.1093/nar/gkv007

W Huber, 2015, Orchestrating high-throughput genomic analysis with Bioconductor., Nat Methods., 12, 115-121, 10.1038/nmeth.3252

S Su, 2016, Glimma: Interactive HTML graphics for RNA-seq data

J Sheridan, 2015, A pooled shRNA screen for regulators of primary mammary stem and progenitor cells identifies roles for Asap1 and Prox1., BMC Cancer., 15, 221, 10.1186/s12885-015-1187-z

Y Liao, 2013, The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote., Nucleic Acids Res., 41, e108, 10.1093/nar/gkt214

Y Liao, 2014, featureCounts: an efficient general purpose program for assigning sequence reads to genomic features., Bioinformatics., 30, 923-30, 10.1093/bioinformatics/btt656

2016, Mus.musculus: Annotation package for the Mus.musculus object

2016, Homo.sapiens: Annotation package for the Homo.sapiens object

S Durinck, 2005, BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis., Bioinformatics., 21, 3439-40, 10.1093/bioinformatics/bti525

S Durinck, 2009, Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt., Nat Protoc., 4, 1184-91, 10.1038/nprot.2009.97

M Robinson, 2010, A scaling normalization method for differential expression analysis of RNA-seq data., Genome Biol., 11, R25, 10.1186/gb-2010-11-3-r25

C Law, 2014, voom: Precision weights unlock linear model analysis tools for RNA-seq read counts., Genome Biol., 15, R29, 10.1186/gb-2014-15-2-r29

R Liu, 2015, Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses., Nucleic Acids Res., 43, e97, 10.1093/nar/gkv412

R Liu, 2016, Transcriptional profiling of the epigenetic regulator Smchd1., Genom Data., 7, 144-7, 10.1016/j.gdata.2015.12.027

G Smyth, 2004, Linear models and empirical bayes methods for assessing differential expression in microarray experiments., Stat Appl Genet Mol Biol., 3, 10.2202/1544-6115.1027

D McCarthy, 2009, Testing significance relative to a fold-change threshold is a TREAT., Bioinformatics., 25, 765-71, 10.1093/bioinformatics/btp053

D Wu, 2012, Camera: a competitive gene set test accounting for inter-gene correlation., Nucleic Acids Res., 40, e133, 10.1093/nar/gks461

A Subramanian, 2005, Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles., Proc Natl Acad Sci U S A., 102, 15545-50, 10.1073/pnas.0506580102

E Lim, 2010, Transcriptome analyses of mouse and human mammary cell subpopulations reveal multiple conserved genes and pathways., Breast Cancer Res., 12, R21, 10.1186/bcr2560

D Wu, 2010, ROAST: rotation gene set tests for complex microarray experiments., Bioinformatics., 26, 2176-82, 10.1093/bioinformatics/btq401

2016, R: A language and environment for statistical computing.

G Warnes, 2016, gplots: Various R Programming Tools for Plotting Data

Y Xie, 2014, knitr: A comprehensive tool for reproducible research in R, Implementing Reproducible Computational Research.

Y Xie, 2015, Dynamic Documents with R and knitr.

Y Xie, 2016, knitr: A General-Purpose Package for Dynamic Report Generation in R