Statistical and Computational Methods for High-Throughput Sequencing Data Analysis of Alternative Splicing

Statistics in Biosciences - Tập 5 - Trang 138-155 - 2012
Liang Chen1
1Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, USA

Tóm tắt

The burgeoning field of high-throughput sequencing significantly improves our ability to understand the complexity of transcriptomes. Alternative splicing, as one of the most important driving forces for transcriptome diversity, can now be studied at an unprecedent resolution. Efficient and powerful computational and statistical methods are in urgent need to facilitate the characterization and quantification of alternative splicing events. Here we discuss methods in splice junction read mapping, and methods in exon-centric or isoform-centric quantification of alternative splicing. In addition, we discuss HITS-CLIP and splicing QTL analyses which are novel high-throughput sequencing based approaches in the dissection of splicing regulation.

Tài liệu tham khảo

Au K, Jiang H, Lin L, Xing Y, Wong W (2010) Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res 38(14):4570–4578 Benjamini Y, Speed T (2012) Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. Epub ahead of print Birol I, Jackman S, Nielsen C, Qian J, Varhol R, Stazyk G, Morin R, Zhao Y, Hirst M, Schein J (2009) De novo transcriptome assembly with ABySS. Bioinformatics 25(21):2872–2877 Black D (2000) Protein diversity from alternative splicing: A challenge for bioinformatics and post-genome biology. Cell 103(3):367–370 Bullard J, Purdom E, Hansen K, Dudoit S (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-seq experiments. BMC Bioinform 11:(94) Burset M, Seledtsov I, Solovyev V (2000) Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res 28(21):4364–4375 Castle J, Zhang C, Shah J, Kulkarni A, Kalsotra A, Cooper T, Johnson J (2008) Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines. Nat Genet 40(12):1416–1425 Chi S, Zang J, Mele A, Darnell R (2009) Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460(7254):479–486 Consul P, Jain G (1973) Generalization of Poisson distribution. Technometrics 15(4):791–799 Cooper T, Wan L, Dreyfuss G (2009) RNA and disease. Cell 136(4):777–793 Coulombe-Huntington J, Lam K, Dias C, Majewski J (2009) Fine-scale variation and genetic determinants of alternative splicing across individuals. PLoS Genet 5(12):e1000766 Darnell R (2010) HITS-CLIP: Panoramic views of protein-RNA regulation in living cells. Wiley Interdiscip Rev RNA 1(2):266–286 Gonzalez-Porta M, Calvo M, Sammeth M, Guigo R (2011) Estimation of alternative splicing variability in human populations. Genome Res, page Epub ahead of print Grant G, Farkas M, Pizarro A, Lahens N, Schug J, Brunk B, Stoeckert C, Hogenesch J, Pierce E (2011) Comparative analysis of RNA-seq alignment algorithms and the RNA-seq unified mapper (RUM). Bioinformatics 27(18):2518–2528 Guttman M, Garber M, Levin J, Donaghey J, Robinson J, Adiconis X, Fan L, Koziol M, Gnirke A, Nusbaum C (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28(5):503–510 Hansen K, Brenner S, Dudoit S (2010) Biases in illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 38(12):e131 Holste D, Huo G, Tung V, Burge C (2006) Hollywood: A comparative relational database of alternative splicing. Nucleic Acids Res 34:D56–62 Huang H, Horng J, Lin F, Chang Y, Huang C (2005) SpliceInfo: An information repository for mRNA alternative splicing in human genome. Nucleic Acids Res 33:D80–85 Jiang H, Wong W (2009) Statistical inferences for isoform expression in RNA-seq. Bioinformatics 25(8):1026–1032 Katz Y, Wang E, Airoldi E, Burge C (2010) Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7(12):1009–1015 Konig J, Zarnack K, Rot G, Curk T, Kayikci M, Zupan B, Turner D, Luscombe N, Ule J (2010) iClip reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat Struct Mol Biol 17(7):909–915 Kwan T, Benovoy D, Dias C, Gurd S, Serre D, Zuzan H, Clark T, Schweitzer A, Staples M, Wang H (2007) Heritability of alternative splicing in the human genome. Genome Res 17(8):1210–1218 Langmead B, Trapnell C, Pop M, Salzberg S (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25 Lee B, Tan T, Ranganathan S (2004) DEDB: A database of Drosophila melanogaster exons in splicing graph form. BMC Bioinform 5:189 Leipzig J, Pevzner P, Heber S (2004) The alternative splicing gallery (ASG): Bridging the gap between genome and transcriptome. Nucleic Acids Res 32(13):3977–3983 Li B, Ruotti V, Stewart R, Thomson J, Dewey C (2010) RNA-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26(4):493–500 Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25(14):1754–1760 Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18(11):1851–1858 Li R, Yu C, Li Y, Lam T, Yiu S, Kristiansen K, Wang J (2009) SOAP2: An improved ultrafast tool for short read alignment. Bioinformatics 25(15):1966–1967 Licatalosi D, Darnell R (2006) Splicing regulation in neurologic disease. Neuron 52(1):93–101 Licatalosi D, Mele A, Fak J, Ule J, Kayikci M, Chi S, Clark T, Schweitzer A, Blume J, Wang X (2008) HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456(7221):464–469 Lunter G, Goodson M (2011) Stampy: A statistical algorithm for sensitive and fast mapping of illumina sequence reads. Genome Res 21(6):936–939 Montgomery S, Sammeth M, Gutierrez-Arcelus M, Lach R, Ingle C, Nisbett J, Guigo R, Dermitzakis E (2010) Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464(7289):773–777 Mortazavi A, Williams B, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Methods 5(7):621–628 Oshlack A, Wakefield M (2009) Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 4:14 Pan Q, Shai O, Lee L, Frey B, Blencowe B (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40(12):1413–1415 Pickrell J, Marioni J, Pai A, Degner J, Engelhardt B, Nkadori E, Veyrieras J, Stephens M, Gilad Y, Pritchard J (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464(7289):768–772 Quail M, Kozarewa I, Smith F, Scally A, Stephens P, Durbin R, Swerdlow H, Turner D (2008) A large genome center’s improvements to the illumina sequencing system. Nat Methods 5(12):1005–1010 Raponi M, Baralle D (2010) Alternative splicing: Good and bad effects of translationally silent substitutions. FEBS J 277(4):836–840 Roberts A, Trapnell C, Donaghey J, Rinn J, Pachter L (2011) Improving RNA-seq expression estimates by correcting for fragment bias. Genome Biol 12(3):R22 Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman S, Mungall K, Lee S, Okada H, Qian J (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7(11):909–912 Sanford J, Wang X, Mort M, Vanduyn N, Cooper D, Mooney S, Edenberg H, Liu Y (2009) Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts. Genome Res 19(3):381–394 Schulz M, Zerbino D, Vingron M, Birney E (2012) Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics, Epub ahead of print Sinha R, Lenser T, Jahn N, Gausmann U, Friedel S, Szafranski K, Huse K, Rosenstiel P, Hampe J, Schuster S, Hiller M, Backofen R, Platzer M (2010) TassDB2—A comprehensive database of subtle alternative splicing events. BMC Bioinform 11:216 Srivastava S, Chen L (2010) A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res 38(17):e170 Stamm S, Riethoven J, Le Texier V, Gopalakrishnan C, Kumanduri V, Tang Y, Barbosa-Morais N, Thanaraj T (2006) ASD: A bioinformatics resource on alternative splicing. Nucleic Acids Res 34:D46–55 Takeda J, Suzuki Y, Sakate R, Sato Y, Gojobori T, Imanishi T, Sugano S (2010) H-DBAS: Human-transcriptome database for alternative splicing: update 2010. Nucleic Acids Res 38:D86–90 Trapnell C, Pachter L, Salzberg S (2009) TopHat: Discovering splice junctions with RNA-seq. Bioinformatics 25(9):1105–1111 Trapnell C, Williams B, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg S, Wold B, Pachter L (2010) Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511–515 Ule J, Jensen K, Ruggiu M, Mele A, Ule A, Darnell R (2003) Clip identifies nova-regulated RNA networks in the brain. Science 302(5648):1212–1215 Wang E, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore S, Schroth G, Burge C (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221):470–476 Wang G, Cooper T (2007) Splicing in disease: Disruption of the splicing code and the decoding machinery. Nat Rev Genet 8(10):749–761 Wang K, Singh D, Zeng Z, Coleman S, Huang Y, Savich G, He X, Mieczkowski P, Grimm S, Perou C (2010) MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38(18):e178 Wu J, Akerman M, Sun S, McCombie W, Krainer A, Zhang M (2011) SpliceTrap: A method to quantify alternative splicing under single cellular conditions. Bioinformatics 27(21):3010–3016 Xue Y, Zhou Y, Wu T, Zhu T, Ji X, Kwon Y, Zhang C, Yeo G, Black D, Sun H (2009) Genome-wide analysis of PTB-RNA interactions reveals a strategy used by the general splicing repressor to modulate exon inclusion or skipping. Mol Cell 36(6):996–1006 Yeo G, Coufal N, Liang T, Peng G, Fu X, Gage F (2009) An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells. Nat Struct Mol Biol 16(2):130–137 Zhang C, Frias M, Mele A, Ruggiu M, Eom T, Marney C, Wang H, Licatalosi D, Fak J, Darnell R (2010) Integrative modeling defines the nova splicing-regulatory network and its combinatorial controls. Science 329(5990):439–443 Zhao Q, Wang Y, Kong Y, Luo D, Li X, Hao P (2011) Optimizing de novo transcriptome assembly from short-read RNA-seq data: A comparative study. BMC Bioinform 12(Suppl 14):S2 Zheng S, Chen L (2009) A hierarchical bayesian model for comparing transcriptomes at the individual transcript isoform level. Nucleic Acids Res 37(10):e75 Zheng W, Chung L, Zhao H (2011) Bias detection and correction in RNA-sequencing data. BMC Bioinform 12:290 Zisoulis D, Lovci M, Wilbert M, Hutt K, Liang T, Pasquinelli A, Yeo G (2010) Comprehensive discovery of endogenous argonaute binding sites in Caenorhabditis elegans. Nat Struct Mol Biol 17(2):173–179