Mobster: accurate detection of mobile element insertions in next generation sequencing data
Tóm tắt
Mobile elements are major drivers in changing genomic architecture and can cause disease. The detection of mobile elements is hindered due to the low mappability of their highly repetitive sequences. We have developed an algorithm, called Mobster, to detect non-reference mobile element insertions in next generation sequencing data from both whole genome and whole exome studies. Mobster uses discordant read pairs and clipped reads in combination with consensus sequences of known active mobile elements. Mobster has a low false discovery rate and high recall rate for both L1 and Alu elements. Mobster is available at
http://sourceforge.net/projects/mobster
.
Tài liệu tham khảo
Beck CR, Garcia-Perez JL, Badge RM, Moran JV: LINE-1 elements in structural variation and disease. Annu Rev Genomics Hum Genet. 2011, 12: 187-215. 10.1146/annurev-genom-082509-141802.
Cordaux R, Batzer MA: The impact of retrotransposons on human genome evolution. Nat Rev Genet. 2009, 10: 691-703. 10.1038/nrg2640.
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.
de Koning APJ, Gu W, Castoe TA, Batzer MA, Pollock DD: Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 2011, 7: e1002384-10.1371/journal.pgen.1002384.
Mills RE, Bennett EA, Iskow RC, Devine SE: Which transposable elements are active in the human genome?. Trends Genet. 2007, 23: 183-191. 10.1016/j.tig.2007.02.006.
Brouha B, Schustak J, Badge RM, Lutz-Prigge S, Farley AH, Moran JV, Kazazian HH: Hot L1s account for the bulk of retrotransposition in the human population. Proc Natl Acad Sci. 2003, 100: 5280-5285. 10.1073/pnas.0831042100.
Bennett EA, Keller H, Mills RE, Schmidt S, Moran JV, Weichenrieder O, Devine SE: Active Alu retrotransposons in the human genome. Genome Res. 2008, 18: 1875-1883. 10.1101/gr.081737.108.
Hancks DC, Kazazian HH: Active human retrotransposons: variation and disease. Curr Opin Genet Dev. 2012, 22: 191-203. 10.1016/j.gde.2012.02.006.
Sen SK, Han K, Wang J, Lee J, Wang H, Callinan PA, Dyer M, Cordaux R, Liang P, Batzer MA: Human genomic deletions mediated by recombination between Alu elements. Am J Hum Genet. 2006, 79: 41-53. 10.1086/504600.
Kazazian HH, Wong C, Youssoufian H, Scott AF, Phillips DG, Antonarakis SE: Haemophilia a resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man. Nature. 1988, 332: 164-166. 10.1038/332164a0.
Iskow RC, McCabe MT, Mills RE, Torene S, Pittard WS, Neuwald AF, Van Meir EG, Vertino PM, Devine SE: Natural mutagenesis of human genomes by endogenous retrotransposons. Cell. 2010, 141: 1253-1261. 10.1016/j.cell.2010.05.020.
Solyom S, Ewing AD, Rahrmann EP, Doucet T, Nelson HH, Burns MB, Harris RS, Sigmon DF, Casella A, Erlanger B, Wheelan S, Upton KR, Shukla R, Faulkner GJ, Largaespada DA, Kazazian HH: Extensive somatic L1 retrotransposition in colorectal tumors. Genome Res. 2012, 22: 2328-2338. 10.1101/gr.145235.112.
Lee E, Iskow R, Yang L, Gokcumen O, Haseley P, Luquette LJ, Lohr JG, Harris CC, Ding L, Wilson RK, Wheeler DA, Gibbs RA, Kucherlapati R, Lee C, Kharchenko PV, Park PJ: Landscape of somatic retrotransposition in human cancers. Science. 2012, 337: 967-971. 10.1126/science.1222077.
Hormozdiari F, Alkan C, Ventura M, Hajirasouliha I, Malig M, Hach F, Yorukoglu D, Dao P, Bakhshi M, Sahinalp SC, Eichler EE: Alu repeat discovery and characterization within human genomes. Genome Res. 2011, 21: 840-849. 10.1101/gr.115956.110.
Ewing AD, Kazazian HH: Whole-genome resequencing allows detection of many rare LINE-1 insertion alleles in humans. Genome Res. 2011, 21: 985-990. 10.1101/gr.114777.110.
Keane TM, Wong K, Adams DJ: RetroSeq: transposable element discovery from next-generation sequencing data. Bioinformatics. 2013, 29: 389-390. 10.1093/bioinformatics/bts697.
Stewart C, Kural D, Strömberg MP, Walker JA, Konkel MK, Stütz AM, Urban AE, Grubert F, Lam HYK, Lee W-P, Busby M, Indap AR, Garrison E, Huff C, Xing J, Snyder MP, Jorde LB, Batzer MA, Korbel JO, Marth GT: A comprehensive map of mobile element insertion polymorphisms in humans. PLoS Genet. 2011, 7: e1002236-10.1371/journal.pgen.1002236.
Picard. [], [http://broadinstitute.github.io/picard/]
Jurka J: Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 2000, 16: 418-420. 10.1016/S0168-9525(00)02093-X.
Wang K, Li M, Hakonarson H: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010, 38: e164-10.1093/nar/gkq603.
Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010, 26: 841-842. 10.1093/bioinformatics/btq033.
Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14: 1188-1190. 10.1101/gr.849004.
Tangram: Wu J, Lee WP, Ward A, Walker JA, Konkel MK, Batzer MA, Marth GT: Tangram: a comprehensive toolbox for mobile element insertion detection. BMC Genomics. 2014, 15: 795-10.1186/1471-2164-15-795.
David M, Mustafa H, Brudno M: Detecting Alu insertions from high-throughput sequencing data. Nucleic Acids Res. 2013, 41: e169-10.1093/nar/gkt612.
CEU Trio high coverage WEx and WGS dataset. [], [ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/working/20120117_ceu_trio_b37_decoy/]
CEU Trio MEI calls of RetroSeq, Tangram and TEA. [], [ftp://ftp-mouse.sanger.ac.uk/other/tk2/RetroSeq/CEU_trio/]
Exome capture intervals for WEx data of CEU Trio. [], [ftp://[email protected]/bundle/2.8/b37/Broad.human.exome.b37.interval_list.gz]
Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25: 1754-1760. 10.1093/bioinformatics/btp324.
Lee W-P, Stromberg MP, Ward A, Stewart C, Garrison EP, Marth GT: MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping. PLoS One. 2014, 9: e90581-10.1371/journal.pone.0090581.
de Ligt J, Willemsen MH, van Bon BW, Kleefstra T, Yntema HG, Kroes T, Vulto-van Silfhout AT, Koolen DA, de Vries P, Gilissen C, del Rosario M, Hoischen A, Scheffer H, de Vries BB, Brunner HG, Veltman JA, Vissers LE: Diagnostic exome sequencing in persons with severe intellectual disability. N Engl J Med. 2012, 367: 1921-1929. 10.1056/NEJMoa1206524.
Cost GJ, Boeke JD: Targeting of human retrotransposon integration is directed by the specificity of the L1 endonuclease for regions of unusual DNA structure†. Biochemistry. 1998, 37: 18081-18093. 10.1021/bi981858s.
Conley AB, Jordan IK: Cell type-specific termination of transcription by transposable element sequences. Mob DNA. 2012, 3: 15-10.1186/1759-8753-3-15.
Szak ST, Pickeral OK, Makalowski W, Boguski MS, Landsman D, Boeke JD: Molecular archeology of L1 insertions in the human genome. Genome Biol. 2002, 3: research0052-10.1186/gb-2002-3-10-research0052.
Symer DE, Connelly C, Szak ST, Caputo EM, Cost GJ, Parmigiani G, Boeke JD: Human l1 retrotransposition is associated with genetic instability in vivo. Cell. 2002, 110: 327-338. 10.1016/S0092-8674(02)00839-5.
Zingler N, Willhoeft U, Brose HP, Schoder V, Jahns T, Hanschmann KM, Morrish TA, Lower J, Schumann GG: Analysis of 5' junctions of human LINE-1 and Alu retrotransposons suggests an alternative model for 5'-end attachment requiring microhomology-mediated end-joining. Genome Res. 2005, 15: 780-789. 10.1101/gr.3421505.
Ewing A, Ballinger T, Earl D, Sequencing BIG, Program A, Platform, Harris C, Ding L, Wilson R, Haussler D: Retrotransposition of gene transcripts leads to structural variation in mammalian genomes.Genome Biol 2013, 14:R22.,
Cordaux R, Hedges DJ, Herke SW, Batzer MA: Estimating the retrotransposition rate of human Alu elements. Gene. 2006, 373: 134-137. 10.1016/j.gene.2006.01.019.
Ewing AD, Kazazian HH: High-throughput sequencing reveals extensive variation in human-specific L1 content in individual human genomes. Genome Res. 2010, 20: 1262-1270. 10.1101/gr.106419.110.
Xing J, Zhang Y, Han K, Salem AH, Sen SK, Huff CD, Zhou Q, Kirkness EF, Levy S, Batzer MA, Jorde LB: Mobile elements create structural variation: analysis of a complete human genome. Genome Res. 2009, 19: 1516-1526. 10.1101/gr.091827.109.