Detecting differential usage of exons from RNA-seq data

Genome Research - Tập 22 Số 10 - Trang 2008-2017 - 2012
Simon Anders1, Alejandro Reyes1, Wolfgang Huber1
1European Molecular Biology Laboratory, 69111 Heidelberg, Germany

Tóm tắt

RNA-seq is a powerful tool for the study of alternative splicing and other forms of alternative isoform expression. Understanding the regulation of these processes requires sensitive and specific detection of differential isoform abundance in comparisons between conditions, cell types, or tissues. We presentDEXSeq, a statistical method to test for differential exon usage in RNA-seq data.DEXSequses generalized linear models and offers reliable control of false discoveries by taking biological variation into account.DEXSeqdetects with high sensitivity genes, and in many cases exons, that are subject to differential exon usage. We demonstrate the versatility ofDEXSeqby applying it to several data sets. The method facilitates the study of regulation and function of alternative exon usage on a genome-wide scale. An implementation ofDEXSeqis available as an R/Bioconductor package.

Từ khóa


Tài liệu tham khảo

Anders S . 2011. HTSeq: Analysing high-throughput sequencing data with Python. http://www-huber.embl.de/users/anders/HTSeq/.

10.1186/gb-2010-11-10-r106

10.1093/bioinformatics/btg173

10.1101/gr.099226.109

10.1073/pnas.0914005107

10.1038/nature10532

2010, Conservation of an RNA regulatory map between Drosophila and mammals, Genome Res, 21, 193

Cameron AC , Trivedi PK . 1998. Regression analysis of count data. Cambridge University Press, Cambridge, UK.

10.1093/bioinformatics/bti1010

1987, Parameter orthogonality and approximate conditional inference, J R Stat Soc Ser B Methodol, 49, 1, 10.1111/j.2517-6161.1987.tb01422.x

10.1006/geno.1997.4763

Di Y , Schafer DW , Cumbie JS , Chang JH . 2011. The NBP negative binomial model for assessing differential gene expression from RNA-Seq. Stat Appl Genet Mol Biol 10. doi: 10.2202/1544-6115.1637.

10.1371/journal.pbio.1001046

10.1093/nar/gkp985

10.1093/nar/gkq1064

10.1038/nmeth.1613

10.1186/gb-2004-5-10-r80

10.1093/bioinformatics/bts260

10.1016/j.gde.2011.03.005

10.1038/nmeth.1503

10.1038/nbt.1910

10.1186/1471-2105-11-422

10.1093/jb/mvg017

10.1126/science.1139816

Huber PJ . 1981. Robust statistics. Wiley, New York.

10.1093/bioinformatics/btp113

10.1038/nmeth.1528

10.1093/nar/gkr931

2002, Replicated microarray data, Statist Sinica, 12, 31

10.1186/1471-2105-6-165

10.1093/nar/gks042

McCullagh P , Nelder JA . 1989 Generalized linear models, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL.

2005, A novel phospholipase C, PLCη2, is a neuron-specific isozyme, J Biol Chem, 280, 128

10.1038/nature08909

10.1093/bioinformatics/btn284

R Development Core Team. 2009 R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org .

10.1186/gb-2011-12-3-r22

10.1093/bioinformatics/btm453

10.1093/biostatistics/kxm030

Robinson M , McCarthy D , Chen Y , Smyth G . 2010a. edgeR: Empirical analysis of digital gene expression data in R. Bioconductor. http://www.bioconductor.org .

10.1093/bioinformatics/btp616

10.1093/nar/gkp885

1996, A conditional likelihood approach to residual maximum likelihood estimation in generalized linear models, J R Stat Soc Ser B Methodol, 58, 565, 10.1111/j.2517-6161.1996.tb02101.x

10.1093/bioinformatics/btp120

10.1038/nbt.1621

10.1186/gb-2011-12-2-r13

10.1073/pnas.091062498

10.1093/nar/gkn788

Urbanek S . 2011 multicore: Parallel processing of R code on machines with multiple cores or CPUs. R package, version 0.1-7. http://cran.r-project.org.

10.1038/nature07509

10.1093/bioinformatics/btq057

10.1021/bi800044n