Barnacle: detecting and characterizing tandem duplications and fusions in transcriptome assemblies
Tóm tắt
Chimeric transcripts, including partial and internal tandem duplications (PTDs, ITDs) and gene fusions, are important in the detection, prognosis, and treatment of human cancers. We describe Barnacle, a production-grade analysis tool that detects such chimeras in de novo assemblies of RNA-seq data, and supports prioritizing them for review and validation by reporting the relative coverage of co-occurring chimeric and wild-type transcripts. We demonstrate applications in large-scale disease studies, by identifying PTDs in MLL, ITDs in FLT3, and reciprocal fusions between PML and RARA, in two deeply sequenced acute myeloid leukemia (AML) RNA-seq datasets. Our analyses of real and simulated data sets show that, with appropriate filter settings, Barnacle makes highly specific predictions for three types of chimeric transcripts that are important in a range of cancers: PTDs, ITDs, and fusions. High specificity makes manual review and validation efficient, which is necessary in large-scale disease studies. Characterizing an extended range of chimera types will help generate insights into progression, treatment, and outcomes for complex diseases.
Từ khóa
Tài liệu tham khảo
Gingeras TR: Implications of chimaeric non-co-linear transcripts. Nat Geosci. 2009, 461 (7261): 206-211.
Melnick A, Licht JD: Deconstructing a disease: RARalpha, its fusion partners, and their roles in the pathogenesis of acute promyelocytic leukemia. Blood. 1999, 93 (10): 3167-3215.
Basecke J, Whelan JT, Griesinger F, Bertrand FE: The MLL partial tandem duplication in acute myeloid leukaemia. Br J Haematol. 2006, 135 (4): 438-449. 10.1111/j.1365-2141.2006.06301.x.
Zheng R, Small D: Mutant FLT3 signaling contributes to a block in myeloid differentiation. Leuk Lymphoma. 2005, 46 (12): 1679-1687. 10.1080/10428190500261740.
Salzman J, Gawad C, Wang PL, Lacayo N, Brown PO: Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS One. 2012, 7 (2): e30733-10.1371/journal.pone.0030733.
Al-Balool HH, Weber D, Liu Y, Wade M, Guleria K, Nam PL, Clayton J, Rowe W, Coxhead J, Irving J, Elliott DJ, Hall AG, Santibanez-Koref M, Jackson MS: Post-transcriptional exon shuffling events in humans can be evolutionarily conserved and abundant. Genome Res. 2011, 21 (11): 1788-1799. 10.1101/gr.116442.110.
Horiuchi T, Giniger E, Aigaki T: Alternative trans-splicing of constant and variable exons of a Drosophila axon guidance gene, lola. Genes Dev. 2003, 17 (20): 2496-2501. 10.1101/gad.1137303.
Krause M, Hirsh D: A trans-spliced leader sequence on actin mRNA in C. elegans. Cell. 1987, 49 (6): 753-761. 10.1016/0092-8674(87)90613-1.
Sutton RE, Boothroyd JC: Evidence for trans splicing in trypanosomes. Cell. 1986, 47 (4): 527-535. 10.1016/0092-8674(86)90617-3.
Tessier LH, Keller M, Chan RL, Fournier R, Weil JH, Imbault P: Short leader sequences may be transferred from small RNAs to pre-mature mRNAs by trans-splicing in Euglena. EMBO J. 1991, 10 (9): 2621-2625.
Hirano M, Noda T: Genomic organization of the mouse Msh4 gene producing bicistronic, chimeric and antisense mRNA. Gene. 2004, 342 (1): 165-177. 10.1016/j.gene.2004.08.016.
Caudevilla C, Serra D, Miliar A, Codony C, Asins G, Bach M, Hegardt FG: Natural trans-splicing in carnitine octanoyltransferase pre-mRNAs in rat liver. Proc Natl Acad Sci USA. 1998, 95 (21): 12185-12190. 10.1073/pnas.95.21.12185.
Frantz SA, Thiara AS, Lodwick D, Ng LL, Eperon IC, Samani NJ: Exon repetition in mRNA. Proc Natl Acad Sci USA. 1999, 96 (10): 5400-5405. 10.1073/pnas.96.10.5400.
Kannan K, Wang L, Wang J, Ittmann MM, Li W, Yen L: Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing. Proc Natl Acad Sci USA. 2011, 108 (22): 9172-9177. 10.1073/pnas.1100489108.
Rickman DS, Pflueger D, Moss B, VanDoren VE, Chen CX, de la Taille A, Kuefer R, Tewari AK, Setlur SR, Demichelis F, Rubin MA: SLC45A3-ELK4 is a novel and frequent erythroblast transformation-specific fusion transcript in prostate cancer. Cancer Res. 2009, 69 (7): 2734-2738. 10.1158/0008-5472.CAN-08-4926.
Song J, Mercer D, Hu X, Liu H, Li MM: Common leukemia- and lymphoma-associated genetic aberrations in healthy individuals. J Mol Diagn. 2011, 13 (2): 213-219. 10.1016/j.jmoldx.2010.10.009.
Li H, Wang J, Mor G, Sklar J: A neoplastic gene fusion mimics trans-splicing of RNAs in normal human cells. Science. 2008, 321 (5894): 1357-1361. 10.1126/science.1156725.
Schnittger S, Bacher U, Haferlach C, Alpermann T, Kern W, Haferlach T: Diversity of the juxtamembrane and TKD1 mutations (exons 13-15) in the FLT3 gene with regards to mutant load, sequence, length, localization, and correlation with biological data. Genes Chromosomes Cancer. 2012, 51 (10): 910-924. 10.1002/gcc.21975.
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011, 29 (7): 644-652. 10.1038/nbt.1883.
Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, Mungall K, Lee S, Okada HM, Qian JQ, Griffith M, Raymond A, Thiessen N, Cezard T, Butterfield YS, Newsome R, Chan SK, She R, Varhol R, Kamoh B, Prabhu AL, Tam A, Zhao Y, Moore RA, Hirst M, Marra MA, Jones SJ, Hoodless PA, Birol I: De novo assembly and analysis of RNA-seq data. Nat Methods. 2010, 7 (11): 909-912. 10.1038/nmeth.1517.
Schulz MH, Zerbino DR, Vingron M, Birney E: Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics. 2012, 28 (8): 1086-1092. 10.1093/bioinformatics/bts094.
Abyzov A, Gerstein M: AGE: defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision. Bioinformatics. 2011, 27 (5): 595-603. 10.1093/bioinformatics/btq713.
Rausch T, Zichner T, Schlattl A, Stutz AM, Benes V, Korbel JO: DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012, 28 (18): i333-i339. 10.1093/bioinformatics/bts378.
Kim D, Salzberg SL: TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 2011, 12 (8): R72-10.1186/gb-2011-12-8-r72.
McPherson A, Hormozdiari F, Zayed A, Giuliany R, Ha G, Sun MG, Griffith M, Heravi Moussavi A, Senz J, Melnyk N, Pacheco M, Marra MA, Hirst M, Nielsen TO, Sahinalp SC, Huntsman D, Shah SP: deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data. PLoS Comput Biol. 2011, 7 (5): e1001138-10.1371/journal.pcbi.1001138.
Sboner A, Habegger L, Pflueger D, Terry S, Chen DZ, Rozowsky JS, Tewari AK, Kitabayashi N, Moss BJ, Chee MS, Demichelis F, Rubin MA, Gerstein MB: FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data. Genome Biol. 2010, 11 (10): R104-10.1186/gb-2010-11-10-r104.
Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Liu J: MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 2010, 38 (18): e178-10.1093/nar/gkq622.
Yorukoglu D, Hach F, Swanson L, Collins CC, Birol I, Sahinalp SC: Dissect: detection and characterization of novel structural alterations in transcribed sequences. Bioinformatics. 2012, 28 (12): i179-i187. 10.1093/bioinformatics/bts214.
Kent WJ: BLAT–the BLAST-like alignment tool. Genome Res. 2002, 12 (4): 656-664.
Smit AFA: RepeatMasker Documentation.http://www.animalgenome.org/bioinfo/resources/manuals/RepeatMasker.html,
Wu TD, Watanabe CK: GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics. 2005, 21 (9): 1859-1875. 10.1093/bioinformatics/bti310.
Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25 (14): 1754-1760. 10.1093/bioinformatics/btp324.
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I: ABySS: a parallel assembler for short read sequence data. Genome Res. 2009, 19 (6): 1117-1123. 10.1101/gr.089532.108.
Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE: Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 2001, 11 (6): 1005-1017. 10.1101/gr.GR-1871R.
Butterfield Y: JAGuaR.http://www.bcgsc.ca/platform/bioinfo/software/jaguar,
Edgren H, Murumagi A, Kangaspeska S, Nicorici D, Hongisto V, Kleivi K, Rye IH, Nyberg S, Wolf M, Borresen-Dale AL, Kallioniemi O: Identification of fusion genes in breast cancer by paired-end RNA-sequencing. Genome Biol. 2011, 12 (1): R6-10.1186/gb-2011-12-1-r6.
Kangaspeska S, Hultsch S, Edgren H, Nicorici D, Murumagi A, Kallioniemi O: Reanalysis of RNA-sequencing data reveals several additional fusion genes with multiple isoforms. PLoS One. 2012, 7 (10): e48745-10.1371/journal.pone.0048745.
The Cancer Genome Atlas Research Network: Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med. 2013, 368 (22): 2059-2074.
Krzyzosiak WJ, Sobczak K, Wojciechowska M, Fiszer A, Mykowska A, Kozlowski P: Triplet repeat RNA structure and its role as pathogenic agent and therapeutic target. Nucleic Acids Res. 2012, 40 (1): 11-26. 10.1093/nar/gkr729.
Houseley J, Tollervey D: Apparent non-canonical trans-splicing is generated by reverse transcriptase in vitro. PLoS One. 2010, 5 (8): e12271-10.1371/journal.pone.0012271.
Homer N: Whole Genome Simulation.http://sourceforge.net/apps/mediawiki/dnaa/index.php?title=Whole_Genome_Simulation,
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352.
Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D: The UCSC Known Genes. Bioinformatics. 2006, 22 (9): 1036-1046. 10.1093/bioinformatics/btl048.
Smit AFA, Hubley R, Green P: RepeatMasker Open-3.0. 1996-2010,http://www.repeatmasker.org,
Benson G: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999, 27 (2): 573-580. 10.1093/nar/27.2.573.
Swanson L: Barnacle.http://www.bcgsc.ca/platform/bioinfo/software/barnacle,