De novo assembly of Dekkera bruxellensis: a multi technology approach using short and long-read sequencing and optical mapping

Oxford University Press (OUP) - Tập 4 - Trang 1-9 - 2015
Remi-Andre Olsen1, Ignas Bunikis2, Ievgeniia Tiukova3, Kicki Holmberg1, Britta Lötstedt1, Olga Vinnere Pettersson2, Volkmar Passoth3, Max Käller1, Francesco Vezzi1
1Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Solna, Sweden
2Uppsala Genome Center, NGI/SciLifeLab, Department of Immunology, Genetics and Pathology, Uppsala University, BMC, Uppsala, Sweden
3Department of Microbiology, Swedish University of Agricultural Sciences, Uppsala, Sweden

Tóm tắt

It remains a challenge to perform de novo assembly using next-generation sequencing (NGS). Despite the availability of multiple sequencing technologies and tools (e.g., assemblers) it is still difficult to assemble new genomes at chromosome resolution (i.e., one sequence per chromosome). Obtaining high quality draft assemblies is extremely important in the case of yeast genomes to better characterise major events in their evolutionary history. The aim of this work is two-fold: on the one hand we want to show how combining different and somewhat complementary technologies is key to improving assembly quality and correctness, and on the other hand we present a de novo assembly pipeline we believe to be beneficial to core facility bioinformaticians. To demonstrate both the effectiveness of combining technologies and the simplicity of the pipeline, here we present the results obtained using the Dekkera bruxellensis genome. In this work we used short-read Illumina data and long-read PacBio data combined with the extreme long-range information from OpGen optical maps in the task of de novo genome assembly and finishing. Moreover, we developed NouGAT, a semi-automated pipeline for read-preprocessing, de novo assembly and assembly evaluation, which was instrumental for this work. We obtained a high quality draft assembly of a yeast genome, resolved on a chromosomal level. Furthermore, this assembly was corrected for mis-assembly errors as demonstrated by resolving a large collapsed repeat and by receiving higher scores by assembly evaluation tools. With the inclusion of PacBio data we were able to fill about 5 % of the optical mapped genome not covered by the Illumina data.

Tài liệu tham khảo

Illumina. http://www.illumina.com/. Accessed 4 Nov 2015. Pacific Biosciences. http://www.pacificbiosciences.com/. Accessed 4 Nov 2015. Microbial Genetic Analysis - OpGen. http://opgen.com/. Accessed 4 Nov 2015. Ion TorrentTM. http://www.thermofisher.com/se/en/home/brands/ion-torrent.html. Accessed 4 Nov 2015. Oxford Nanopore Technologies. https://www.nanoporetech.com/. Accessed 4 Nov 2015. BioNano Genomics. http://www.bionanogenomics.com/. Accessed 4 Nov 2015. Nystedt B, Street NR, Wetterbom A, Zuccolo A, Lin Y-C, Scofield DG, et al. The Norway spruce genome sequence and conifer genome evolution. Nature. 2013;497:579–84. Chin C-S, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10:563–9. Chaisson MJP, Huddleston J, Dennis MY, Sudmant PH, Malig M, Hormozdiari F, et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature. 2014;517:608–11. Guo X, Zheng S, Dang H, Pace RG, Stonebraker JR, Jones CD, et al. Genome reference and sequence variation in the large repetitive central exon of human MUC5AC. Am J Respir Cell Mol Biol. 2014;50:223–32. Sharon D, Tilgner H, Grubert F, Snyder M. A single-molecule long-read survey of the human transcriptome. Nat Biotechnol. 2013;31:1009–14. Schwartz D, Li X, Hernandez L, Ramnarain S, Huff E, Wang Y. Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. Science. 1993;262:110–4. Anantharaman TS, Mishra B, Schwartz DC. Genomics via optical mapping. II: Ordered restriction maps. J Comput Biol. 1997;4:91–118. Valouev A, Zhang Y, Schwartz DC, Waterman MS. Refinement of optical map assemblies. Bioinformatics. 2006;22:1217–24. Valouev A, Li L, Liu Y-C, Schwartz DC, Yang Y, Zhang Y, et al. Alignment of optical maps. J Comput Biol. 2006;13:442–62. Valouev A, Schwartz DC, Zhou S, Waterman MS. An algorithm for assembly of ordered restriction maps from single DNA molecules. Proc Natl Acad Sci U S A. 2006;103:15770–5. Jing J, Reed J, Huang J, Hu X, Clarke V, Edington J, et al. Automated high resolution optical mapping using arrayed, fluid-fixed DNA molecules. Proc Natl Acad Sci U S A. 1998;95:8046–51. Dimalanta ET, Lim A, Runnheim R, Lamers C, Churas C, Forrest DK, et al. A microfluidic system for large DNA molecule arrays. Anal Chem. 2004;76:5293–301. Zhou S, Kile A, Bechner M, Place M, Kvikstad E, Deng W, et al. Single-Molecule Approach to Bacterial Genomic Comparisons via Optical Mapping. J Bacteriol. 2004;186:7773–82. Giongo A, Tyler HL, Zipperer UN, Triplett EW. Two genome sequences of the same bacterial strain, Gluconacetobacter diazotrophicus PAl 5, suggest a new standard in genome sequence submission. Stand Genomic Sci. 2010;2:309–17. Miller JM. Whole-genome mapping: A new paradigm in strain-typing technology. J Clin Microbiol. 2013;51:1066–70. Boers SA, Burggrave R, van Westreenen M, Goessens WHF, Hays JP. Whole-genome mapping for high-resolution genotyping of Pseudomonas aeruginosa. J Microbiol Methods. 2014;106C:19–22. Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, et al. Mapping and sequencing of structural variation from eight human genomes. Nature. 2008;453:56–64. Teague B, Waterman MS, Goldstein S, Potamousis K, Zhou S, Reslewic S, et al. High-resolution human genome structure by single-molecule analysis. Proc Natl Acad Sci U S A. 2010;107:10848–53. Gupta A, Place M, Goldstein S, Sarkar D, Zhou S, Potamousis K, et al. Single-molecule analysis reveals widespread structural variation in multiple myeloma. Proc Natl Acad Sci U S A. 2015;112:7689–94. Raeside C, Gaffé J, Deatherage DE, Tenaillon O, Briska AM, Ptashkin RN, et al. Large chromosomal rearrangements during a long-term evolution experiment with Escherichia coli. MBio. 2014;5:e01377–14. Sabirova JS, Xavier BB, Ieven M, Goossens H, Malhotra-Kumar S. Whole genome mapping as a fast-track tool to assess genomic stability of sequenced Staphylococcus aureus strains. BMC Res Notes. 2014;7:704. Coleman JJ, Rounsley SD, Rodriguez-Carres M, Kuo A, Wasmann CC, Grimwood J, et al. The genome of Nectria haematococca: contribution of supernumerary chromosomes to gene expansion. PLoS Genet. 2009;5:e1000618. Dong Y, Xie M, Jiang Y, Xiao N, Du X, Zhang W, et al. Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus). Nat Biotechnol. 2013;31:135–41. Zhou S, Wei F, Nguyen J, Bechner M, Potamousis K, Goldstein S, et al. A single molecule scaffold for the maize genome. PLoS Genet. 2009;5:e1000711. Church DM, Goodstadt L, Hillier LW, Zody MC, Goldstein S, She X, et al. Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol. 2009;7:e1000112. Dujon B. Yeast evolutionary genomics. Nat Rev Genet. 2010;11:512–24. Ford CB, Funt JM, Abbey D, Issi L, Guiducci C, Martinez DA, et al. The evolution of drug resistance in clinical isolates of Candida albicans. Elife. 2014;4:e00662. Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, et al. Life with 6000 genes. Science. 1996;274:546. 563-7. Schacherer J, Ruderfer DM, Gresham D, Dolinski K, Botstein D, Kruglyak L. Genome-wide analysis of nucleotide-level variation in commonly used Saccharomyces cerevisiae strains. PLoS One. 2007;2:e322. Liti G, Carter DM, Moses AM, Warringer J, Parts L, James SA, et al. Population genomics of domestic and wild yeasts. Nature. 2009;458:337–41. Schacherer J, Shapiro JA, Ruderfer DM, Kruglyak L. Comprehensive polymorphism survey elucidates population structure of Saccharomyces cerevisiae. Nature. 2009;458:342–5. Magee BB, Magee PT. Electrophoretic karyotypes and chromosome numbers in Candida species. J Gen Microbiol. 1987;133:425–30. Passoth V, Hansen M, Klinner U, Emeis CC. The electrophoretic banding pattern of the chromosomes of Pichia stipitis and Candida shehatae. Curr Genet. 1992;22:429–31. Hellborg L, Piškur J. Complex nature of the genome in a wine spoilage yeast, Dekkera bruxellensis. Eukaryot Cell. 2009;8:1739–49. Vigentini I, De Lorenzis G, Picozzi C, Imazio S, Merico A, Galafassi S, et al. Intraspecific variations of Dekkera/Brettanomyces bruxellensis genome studied by capillary electrophoresis separation of the intron splice site profiles. Int J Food Microbiol. 2012;157:6–15. NouGAT. https://github.com/SciLifeLab/NouGAT/. Accessed 4 Nov 2015. Earl DA, Bradnam K, St John J, Darling A, Lin D, Faas J, et al. Assemblathon 1: A competitive assessment of de novo short read assembly methods. Genome Res. 2011;21(12):2224–41. doi:10.1101/gr.126599.111. Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience. 2013;2:10. Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, et al. GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res. 2012;22:557–67. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–20. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009;19:1117–23. FastQC A Quality Control tool for High Throughput Sequence Data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/. Accessed 4 Nov 2015. Vezzi F, Narzisi G, Mishra B. Feature-by-feature--evaluating de novo sequence assembly. PLoS One. 2012;7:e31002. Clark SC, Egan R, Frazier PI, Wang Z. ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies. Bioinformatics. 2013;29:435–43. Rahman A, Pachter L. CGAL: computing genome assembly likelihoods. Genome Biol. 2013;14:R8. Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A. 2011;108:1513–8. Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012;1:18. FALCON: Experimental PacBio Diploid Assembler. https://github.com/PacificBiosciences/FALCON. Accessed 4 Nov 2015. Bashir A, Klammer AA, Robins WP, Chin C-S, Webster D, Paxinos E, et al. A hybrid approach for the automated finishing of bacterial genomes. Nat Biotechnol. 2012;30:701–7. Miller JR, Delcher AL, Koren S, Venter E, Walenz BP, Brownley A, et al. Aggressive assembly of pyrosequencing reads with mates. Bioinformatics. 2008;24:2818–24. Vicedomini R, Vezzi F, Scalabrin S, Arvestad L, Policriti A. GAM-NGS: genomic assemblies merger for next generation sequencing. BMC Bioinformatics. 2013;14 Suppl 7:S6. Parra G, Bradnam K, Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23:1061–7. Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, et al. The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol. 2008;26:541–7. Chain PSG, Grafham DV, Fulton RS, Fitzgerald MG, Hostetler J, Muzny D, et al. Genomics. Genome project standards in a new era of sequencing. Science. 2009;326:236–7. Marie-Nelly H, Marbouty M, Cournac A, Flot J-F, Liti G, Parodi DP, et al. High-quality genome (re)assembly using chromosomal contact data. Nat Commun. 2014;5:5695. Dujon B, Sherman D, Fischer G, Durrens P, Casaregola S, Lafontaine I, et al. Genome evolution in yeasts. Nature. 2004;430:35–44. Scannell DR, Butler G, Wolfe KH. Yeast genome evolution--the origin of the species. Yeast. 2007;24:929–42. Bergström A, Simpson JT, Salinas F, Barré B, Parts L, Zia A, et al. A high-definition view of functional genetic variation from natural yeast genomes. Mol Biol Evol. 2014;31:872–88. Jackson AP, Gamble JA, Yeomans T, Moran GP, Saunders D, Harris D, et al. Comparative genomics of the fungal pathogens Candida dubliniensis and Candida albicans. Genome Res. 2009;19:2231–44. Souciet J-L, Dujon B, Gaillardin C, Johnston M, Baret PV, Cliften P, et al. Comparative genomics of protoploid Saccharomycetaceae. Genome Res. 2009;19:1696–709. Rhind N, Chen Z, Yassour M, Thompson DA, Haas BJ, Habib N, et al. Comparative functional genomics of the fission yeasts. Science. 2011;332:930–6. Wendland J, Walther A. Genome evolution in the eremothecium clade of the Saccharomyces complex revealed by comparative genomics. G3 (Bethesda). 2011;1:539–48. Olsen R, Bunikis I, Tiukova I, Holmberg K, Lotstedt B, Pettersson OV, et al. Supporting data and materials for the de novo assembly of Dekkera bruxellensis CBS11270 using multiple technologies. GigaScience Database. 2015. doi:10.5524/100179.