Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement

PLoS ONE - Tập 9 Số 11 - Trang e112963
Bruce J. Walker1, Thomas Abeel1,2, Terrance Shea1, Margaret Priest1, Amr Abouelliel1, Sharadha Sakthikumar1, Christina A. Cuomo1, Qiandong Zeng1, Jennifer R. Wortman1, Sarah Young1, Ashlee M. Earl1
1Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America.
2VIB Department of Plant Systems Biology, Ghent University, Ghent, Belgium

Tóm tắt

Từ khóa


Tài liệu tham khảo

C Chewapreecha, 2014, Dense genomic sampling identifies highways of pneumococcal recombination, Nat Genet, 46, 305, 10.1038/ng.2895

I Comas, 2013, Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans, Nat Genet, 45, 1176, 10.1038/ng.2744

NJ Croucher, 2013, Population genomics of post-vaccine changes in pneumococcal epidemiology, Nat Genet, 45, 656, 10.1038/ng.2625

YH Grad, 2014, Genomic epidemiology of Neisseria gonorrhoeae with reduced susceptibility to cefixime in the USA: a retrospective observational study, Lancet Infect Dis, 14, 220, 10.1016/S1473-3099(13)70693-5

R Ronen, 2012, SEQuel: improving the accuracy of genome assemblies, Bioinformatics, 28, i188, 10.1093/bioinformatics/bts219

MT Swain, 2012, A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs, Nat Protoc, 7, 1260, 10.1038/nprot.2012.068

M Hunt, 2013, REAPR: a universal tool for genome assembly evaluation, Genome Biol, 14, R47, 10.1186/gb-2013-14-5-r47

R Vicedomini, 2013, GAM-NGS: genomic assemblies merger for next generation sequencing, BMC Bioinformatics, 14, S6, 10.1186/1471-2105-14-S7-S6

H Li, 2009, The Sequence Alignment/Map format and SAMtools, Bioinformatics, 25, 2078, 10.1093/bioinformatics/btp352

A McKenna, 2010, The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data, Genome Res, 20, 1297, 10.1101/gr.107524.110

S Pabinger, 2013, A survey of tools for variant analysis of next-generation genome sequencing data, Brief Bioinform, 15, 256, 10.1093/bib/bbs086

A Cubillos-Ruiz, 2008, Analysis of the genetic variation in Mycobacterium tuberculosis strains by multiple genome alignments, BMC Res Notes, 1, 110, 10.1186/1756-0500-1-110

S El-Metwally, 2013, Next-generation sequence assembly: four stages of data processing and computational challenges, PLoS Comput Biol, 9, e1003345, 10.1371/journal.pcbi.1003345

H Tettelin, 2001, Complete genome sequence of a virulent isolate of Streptococcus pneumoniae, Science, 293, 498, 10.1126/science.1061217

H Li, 2009, Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, 25, 1754, 10.1093/bioinformatics/btp324

IJ Tsai, 2010, Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps, Genome Biol, 11, R41, 10.1186/gb-2010-11-4-r41

F Nadalin, 2012, GapFiller: a de novo assembly approach to fill the gap within paired reads, BMC Bioinformatics, 13, S8, 10.1186/1471-2105-13-S14-S8

TD Otto, 2010, Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology, Bioinformatics, 26, 1704, 10.1093/bioinformatics/btq269

R Luo, 2005, Solution structure of choline binding protein A, the major adhesin of Streptococcus pneumoniae, EMBO J, 24, 34, 10.1038/sj.emboj.7600490

AH Tu, 1999, Pneumococcal surface protein A inhibits complement activation by Streptococcus pneumoniae, Infect Immun, 67, 4720, 10.1128/IAI.67.9.4720-4724.1999

G Butler, 2009, Evolution of pathogenicity and sexual reproduction in eight Candida genomes, Nature, 459, 657, 10.1038/nature08064

T Jones, 2004, The diploid genome sequence of Candida albicans, Proc Natl Acad Sci U S A, 101, 7329, 10.1073/pnas.0401648101

D Muzzey, 2013, Assembly of a phased diploid Candida albicans genome facilitates allele-specific measurements and provides a simple model for repeat and indel structure, Genome Biol, 14, R97, 10.1186/gb-2013-14-9-r97

K Chen, 2009, BreakDancer: an algorithm for high-resolution mapping of genomic structural variation, Nat Methods, 6, 677, 10.1038/nmeth.1363

T Marschall, 2012, CLEVER: clique-enumerating variant finder, Bioinformatics, 28, 2875, 10.1093/bioinformatics/bts566

B Weiner, 2012, Independent large scale duplications in multiple M. tuberculosis lineages overlapping the same genomic region, PLoS One, 7, e26038, 10.1371/journal.pone.0026038

TR Ioerger, 2010, Variation among genome sequences of H37Rv strains of Mycobacterium tuberculosis from multiple laboratories, J Bacteriol, 192, 3645, 10.1128/JB.00166-10

S Kohli, 2012, Comparative genomic and proteomic analyses of PE/PPE multigene family of Mycobacterium tuberculosis H<sub>37</sub>Rv and H<sub>37</sub>Ra reveal novel and interesting differences with implications in virulence, Nucleic Acids Res, 40, 7113, 10.1093/nar/gks465

HM Vordermeier, 2012, Conserved immune recognition hierarchy of mycobacterial PE/PPE proteins during infection in natural hosts, PLoS One, 7, e40890, 10.1371/journal.pone.0040890

S Das, 1995, IS6110 restriction fragment length polymorphism typing of clinical isolates of Mycobacterium tuberculosis from patients with pulmonary tuberculosis in Madras, south India, Tuber Lung Dis, 76, 550, 10.1016/0962-8479(95)90533-2

A Karboul, 2008, Frequent homologous recombination events in Mycobacterium tuberculosis PE/PPE multigene families: potential role in antigenic variability, J Bacteriol, 190, 7838, 10.1128/JB.00827-08

C Ford, 2012, Mycobacterium tuberculosis—heterogeneity revealed through whole genome sequencing, Tuberculosis (Edinb), 92, 194, 10.1016/j.tube.2011.11.003

CRE McEvoy, 2012, Comparative analysis of Mycobacterium tuberculosis pe and ppe genes reveals high sequence variation and an apparent absence of selective constraints, PLoS One, 7, e30593, 10.1371/journal.pone.0030593

B Langmead, 2012, Fast gapped-read alignment with Bowtie 2, Nat Methods, 9, 357, 10.1038/nmeth.1923

H Thorvaldsdóttir, 2013, Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration, Brief Bioinform, 14, 178, 10.1093/bib/bbs017

T Abeel, 2012, GenomeView: a next-generation genome browser, Nucleic Acids Res, 40, e12, 10.1093/nar/gkr995

MG Ross, 2013, Characterizing and measuring bias in sequence data, Genome Biol, 14, R51, 10.1186/gb-2013-14-5-r51

YH Grad, 2012, Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe, 2011, Proc Natl Acad Sci U S A, 109, 3065, 10.1073/pnas.1121491109

FJ Ribeiro, 2012, Finished bacterial genomes from shotgun sequence data, Genome Res, 22, 2270, 10.1101/gr.141515.112

LJS Williams, 2012, Paired-end sequencing of Fosmid libraries by Illumina, Genome Res, 22, 2241, 10.1101/gr.138925.112

S Gnerre, 2011, High-quality draft assemblies of mammalian genomes from massively parallel sequence data, Proc Natl Acad Sci U S A, 108, 1513, 10.1073/pnas.1017351108

SF Altschul, 1990, Basic local alignment search tool, J Mol Biol, 215, 403, 10.1016/S0022-2836(05)80360-2

AL Delcher, 2002, Fast algorithms for large-scale genome alignment and comparison, Nucleic Acids Res, 30, 2478, 10.1093/nar/30.11.2478

a Larkin M, 2007, Clustal W and Clustal X version 2.0, Bioinformatics, 23, 2947, 10.1093/bioinformatics/btm404

TA Tatusova, 1999, BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences, FEMS Microbiol Lett, 174, 247, 10.1111/j.1574-6968.1999.tb13575.x

T Abeel, 2008, ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles, Bioinformatics, 24, i24, 10.1093/bioinformatics/btn172