Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences

F1000Research - Tập 4 - Trang 1521
Charlotte Soneson1,2, Michael I. Love3,4, Mark D. Robinson1,2
1Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057
2SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057
3Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210
4Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115

Tóm tắt

High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Several different quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that while the presence of differential isoform usage can lead to inflated false discovery rates in differential expression analyses on simple count matrices and transcript-level abundance estimates improve the performance in simulated data, the difference is relatively minor in several real data sets. Finally, we provide an R package (tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.

Từ khóa


Tài liệu tham khảo

Y Liao, 2014, featureCounts: an efficient general purpose program for assigning sequence reads to genomic features., Bioinformatics., 30, 923-30, 10.1093/bioinformatics/btt656

S Anders, 2015, HTSeq - a Python framework to work with high-throughput sequencing data., Bioinformatics., 31, 166-169, 10.1093/bioinformatics/btu638

C Trapnell, 2012, Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks., Nat Protoc., 7, 562-78, 10.1038/nprot.2012.016

B Li, 2011, RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome., BMC Bioinformatics., 12, 323, 10.1186/1471-2105-12-323

P Glaus, 2012, Identifying differentially expressed transcripts from RNA-seq data with biological variation., Bioinformatics., 28, 1721-1728, 10.1093/bioinformatics/bts260

N Bray, 2015, Near-optimal RNA-Seq quantification., arXiv:1505.02710.

R Patro, 2015, Accurate, fast, and model-aware transcript expression quantification with Salmon., bioRxiv., 10.1101/021592

A Mortazavi, 2008, Mapping and quantifying mammalian transcriptomes by RNA-Seq., Nat Methods., 5, 621-628, 10.1038/nmeth.1226

C Trapnell, 2010, Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation., Nat Biotechnol., 28, 511-515, 10.1038/nbt.1621

G Wagner, 2012, Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples., Theory Biosci., 131, 281-285, 10.1007/s12064-012-0162-3

M Love, 2014, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2., Genome Biol., 15, 550, 10.1186/s13059-014-0550-8

M Robinson, 2010, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data., Bioinformatics., 26, 139-40, 10.1093/bioinformatics/btp616

M Ritchie, 2015, limma powers differential expression analyses for RNA-sequencing and microarray studies., Nucleic Acids Res., 43, e47, 10.1093/nar/gkv007

D Bottomly, 2011, Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum using RNA-Seq and microarrays., PLoS One., 6, e17820, 10.1371/journal.pone.0017820

S Yang, 2015, Common and specific downstream signaling targets controlled by Tlr2 and Tlr5 innate immune signaling in zebrafish., BMC Genomics., 16, 547, 10.1186/s12864-015-1740-9

A Currais, 2015, A comprehensive multiomics approach toward understanding the relationship between aging and dementia., Aging (Albany. NY)., 7, 937-955

A Chang, 2015, Oxygen regulation of breathing through an olfactory receptor activated by lactate., Nature., 527, 240-244, 10.1038/nature15721

C Soneson, 2015, Differential transcript usage from RNA-seq data: isoform pre-filtering improves performance of count-based methods., bioRxiv., 10.1101/025387

A Kanitz, 2015, Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data., Genome Biol., 16, 150, 10.1186/s13059-015-0702-5

C Robert, 2015, Errors in RNA-Seq quantification affect genes of relevance to human disease., Genome Biol., 16, 177, 10.1186/s13059-015-0734-x

S Anders, 2012, Detecting differential usage of exons from RNA-seq data., Genome Res., 22, 2008-17, 10.1101/gr.133744.111

M Lawrence, 2013, Software for computing and annotating genomic ranges., PLoS Comput Biol., 9, e1003118, 10.1371/journal.pcbi.1003118

C Trapnell, 2013, Differential analysis of gene regulation at transcript resolution with RNA-seq., Nat Biotechnol., 31, 46-53, 10.1038/nbt.2450

S Zhao, 2015, Union Exon Based Approach for RNA-Seq Gene Quantification: To Be or Not to Be?, PLoS One., 10, e0141910, 10.1371/journal.pone.0141910

M Gonzàlez-Porta, 2013, Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene., Genome Biol., 14, R70, 10.1186/gb-2013-14-7-r70

E Antonarakis, 2014, AR-V7 and resistance to enzalutamide and abiraterone in prostate cancer., N Engl J Med., 371, 1028-38, 10.1056/NEJMoa1315815

M Love, 2015, Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation., bioRxiv., 10.1101/025767

C Soneson, 2015, Data set 1 in: Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences., F1000Research., 10.5256/f1000research.7563.d109328

C Soneson, 2015, Data set 2 in: Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences., F1000Research., 10.5256/f1000research.7563.d109329

C Soneson, 2015, Data set 3 in: Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences., F1000Research., 10.5256/f1000research.7563.d109330

C Soneson, 2015, Data set 4 in: Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences., F1000Research., 10.5256/f1000research.7563.d109331

C Soneson, 2015, Data set 5 in: Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences., F1000Research., 10.5256/f1000research.7563.d109332

C Soneson, 2015, Data set 6 in: Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences., F1000Research., 10.5256/f1000research.7563.d109333