YMAP: a pipeline for visualization of copy number variation and loss of heterozygosity in eukaryotic pathogens
Tóm tắt
The design of effective antimicrobial therapies for serious eukaryotic pathogens requires a clear understanding of their highly variable genomes. To facilitate analysis of copy number variations, single nucleotide polymorphisms and loss of heterozygosity events in these pathogens, we developed a pipeline for analyzing diverse genome-scale datasets from microarray, deep sequencing, and restriction site associated DNA sequence experiments for clinical and laboratory strains of Candida albicans, the most prevalent human fungal pathogen. The YMAP pipeline (
http://lovelace.cs.umn.edu/Ymap/
) automatically illustrates genome-wide information in a single intuitive figure and is readily modified for the analysis of other pathogens with small genomes.
Tài liệu tham khảo
Selmecki A, Forche A, Berman J: Aneuploidy and isochromosome formation in drug-resistant Candida albicans. Science. 2006, 313: 367-370. 10.1126/science.1128242.
Selmecki A, Gerami-Nejad M, Paulson C, Forche A, Berman J: An isochromosome confers drug resistance in vivo by amplification of two genes, ERG11 and TAC1. Mol Microbiol. 2008, 68: 624-641. 10.1111/j.1365-2958.2008.06176.x.
Kobayashi T, Heck DJ, Nomura M, Horiuchi T: Expansion and contraction of ribosomal DNA repeats in Saccharomyces cerevisiae: requirement of replication fork blocking (Fob1) protein and the role of RNA polymerase I. Genes Dev. 1998, 12: 3821-3830. 10.1101/gad.12.24.3821.
Ketel C, Wang HSW, McClellan M, Bouchonville K, Selmecki A, Lahav T, Gerami-Nejad M, Berman J: Neocentromeres form efficiently at multiple possible loci in Candida albicans. PLoS Genet. 2009, 5: e1000400-10.1371/journal.pgen.1000400.
Baum M, Sanyal K, Mishra PK, Thaler N, Carbon J: Formation of functional centromeric chromatin is specified epigenetically in Candida albicans. Proc Natl Acad Sci U S A. 2006, 103: 14877-14882. 10.1073/pnas.0606958103.
McEachern MJ, Hicks JB: Unusually large telomeric repeats in the yeast Candida albicans. Mol Cell Biol. 1993, 13: 551-560.
van het Hoog M, Rast TJ, Martchenko M, Grindle S, Dignard D, Hogues H, Cuomo C, Berriman M, Scherer S, Magee BB, Whiteway M, Chibana H, Nantel A, Magee PT: Assembly of the Candida albicans genome into sixteen supercontigs aligned on the eight chromosomes. Genome Biol. 2007, 8: R52-10.1186/gb-2007-8-4-r52.
Anderson MZ, Baller JA, Dulmage K, Wigen L, Berman J: The three clades of the telomere-associated TLO gene family of Candida albicans have different splicing, localization, and expression features. Eukaryot Cell. 2012, 11: 1268-1275. 10.1128/EC.00230-12.
Rustchenko EP, Curran TM, Sherman F: Variations in the number of ribosomal DNA units in morphological mutants and normal strains of Candida albicans and in normal strains of Saccharomyces cerevisiae. J Bacteriol. 1993, 175: 7189-7199.
Lephart PR, Chibana H, Magee PT: Effect of the major repeat sequence on chromosome loss in Candida albicans. Eukaryot Cell. 2005, 4: 733-741. 10.1128/EC.4.4.733-741.2005.
Janbon G, Sherman F, Rustchenko E: Monosomy of a specific chromosome determines L-sorbose utilization: a novel regulatory mechanism in Candida albicans. Proc Natl Acad Sci U S A. 1998, 95: 5150-5155. 10.1073/pnas.95.9.5150.
Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z: A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform. 2014, 15: 256-278. 10.1093/bib/bbs086.
Dolled-Filhart MP, Lee M, Ou-Yang C-W, Haraksingh RR, Lin JC-H: Computational and bioinformatics frameworks for next-generation whole exome and genome sequencing. Sci World J. 2013, 2013: 730210-10.1155/2013/730210.
Naquin D, D'Aubenton-Carafa Y, Thermes C, Silvain M: CIRCUS: a package for Circos display of structural genome variations from paired-end and mate-pair sequencing data. BMC Bioinformatics. 2014, 15: 198-10.1186/1471-2105-15-198.
Qi J, Zhao F: inGAP-sv: a novel scheme to identify and visualize structural variation from paired end mapping data. Nucleic Acids Res. 2011, 39: W567-W575. 10.1093/nar/gkr506.
O'Brien TM, Ritz AM, Raphael BJ, Laidlaw DH: Gremlin: an interactive visualization model for analyzing genomic rearrangements. IEEE Trans Vis Comput Graph. 2010, 16: 918-926. 10.1109/TVCG.2010.163.
Straver R, Sistermans EA, Holstege H, Visser A, Oudejans CBM, Reinders MJT: WISECONDOR: detection of fetal aberrations from shallow sequencing maternal plasma based on a within-sample comparison scheme. Nucleic Acids Res. 2014, 42: e31-e31. 10.1093/nar/gkt992.
Kinde I, Papadopoulos N, Kinzler KW, Vogelstein B: FAST-SeqS: a simple and efficient method for the detection of aneuploidy by massively parallel sequencing. PLoS One. 2012, 7: e41162-10.1371/journal.pone.0041162.
Myers CL, Dunham MJ, Kung SY, Troyanskaya OG: Accurate detection of aneuploidies in array CGH and gene expression microarray data. Bioinformatics. 2004, 20: 3533-3543. 10.1093/bioinformatics/bth440.
Piazza R, Magistroni V, Pirola A, Redaelli S, Spinelli R, Redaelli S, Galbiati M, Valletta S, Giudici G, Cazzaniga G, Gambacorti-Passerini C: CEQer: a graphical tool for copy number and allelic imbalance detection from whole-exome sequencing data. PLoS One. 2013, 8: e74825-10.1371/journal.pone.0074825.
Sathirapongsasuti JF, Lee H, Horst BAJ, Brunner G, Cochran AJ, Binder S, Quackenbush J, Nelson SF: Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics. 2011, 27: 2648-2654. 10.1093/bioinformatics/btr462.
Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP: Integrative genomics viewer. Nat Biotechnol. 2011, 29: 24-26. 10.1038/nbt.1754.
Thorvaldsdóttir H, Robinson JT, Mesirov JP: Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013, 14: 178-192. 10.1093/bib/bbs017.
Legrand M, Forche A, Selmecki A, Chan C, Kirkpatrick DT, Berman J: Haplotype mapping of a diploid non-meiotic organism using existing and induced aneuploidies. PLoS Genet. 2008, 4: e1-10.1371/journal.pgen.0040001.
Abbey D, Hickman M, Gresham D, Berman J: High-resolution SNP/CGH microarrays reveal the accumulation of loss of heterozygosity in commonly used Candida albicans strains. G3 (Bethesda). 2011, 1: 523-530. 10.1534/g3.111.000885.
Cromie GA, Hyma KE, Ludlow CL, Garmendia-Torres C, Gilbert TL, May P, Huang AA, Dudley AM, Fay JC: Genomic sequence diversity and population structure of Saccharomyces cerevisiae assessed by RAD-seq. G3 (Bethesda). 2013, 3: 2163-2171. 10.1534/g3.113.007492.
YMAPpipeline website. [], [http://lovelace.cs.umn.edu/Ymap/]
Ymap Source code hosted at Sourceforge. [], [https://sourceforge.net/projects/ymap/]
Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012, 9: 357-359. 10.1038/nmeth.1923.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25: 2078-2079. 10.1093/bioinformatics/btp352.
Wysokar A, Tibbetts K, McCown M, Homer N, Fennell T. Picard: A set of tools for working with next generation sequencing data in BAM format. [ ], [http://broadinstitute.github.io/picard/]
Andrews S. FastQC: A quality control tool for high throughput sequence data. [], [http://www.bioinformatics.babraham.ac.uk/projects/fastqc/]
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010, 20: 1297-1303. 10.1101/gr.107524.110.
Gostev M, Faulconbridge A, Brandizi M, Fernandez-Banet J, Sarkans U, Brazma A, Parkinson H: The BioSample Database (BioSD) at the European Bioinformatics Institute. Nucleic Acids Res. 2012, 40: D64-D70. 10.1093/nar/gkr937.
Miller MG, Johnson AD: White-opaque switching in Candida albicans is controlled by mating-type locus homeodomain proteins and allows efficient mating. Cell. 2002, 110: 293-302. 10.1016/S0092-8674(02)00837-1.
Noble SM, Johnson AD: Strains and strategies for large-scale gene deletion studies of the diploid human fungal pathogen Candida albicans. Eukaryot Cell. 2005, 4: 298-309. 10.1128/EC.4.2.298-309.2005.
Cleveland WS: Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc. 1978, 74: 829-836. 10.1080/01621459.1979.10481038.
Arlot S, Celisse A: A survey of cross-validation procedures for model selection. Stat Surveys. 2010, 4: 40-79. 10.1214/09-SS054.
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2014, 42: D7-D17. 10.1093/nar/gkt1146.
Butler G, Rasmussen MD, Lin MF, Santos MAS, Sakthikumar S, Munro CA, Rheinbay E, Grabherr M, Forche A, Reedy JL, Agrafioti I, Arnaud MB, Bates S, Brown AJP, Brunke S, Costanzo MC, Fitzpatrick DA, de Groot PWJ, Harris D, Hoyer LL, Hube B, Klis FM, Kodira C, Lennard N, Logue ME, Martin R, Neiman AM, Nikolaou E, Quail MA, Quinn J: Evolution of pathogenicity and sexual reproduction in eight Candida genomes. Nature. 2009, 459: 657-662. 10.1038/nature08064.
Hickman MA, Zeng G, Forche A, Hirakawa MP, Abbey D, Harrison BD, Wang Y-M, Su C-H, Bennett RJ, Wang Y, Berman J: The ‘obligate diploid’ Candida albicans forms mating-competent haploids. Nature. 2013, 494: 55-59. 10.1038/nature11865.
Muzzey D, Schwartz K, Weissman JS, Sherlock G: Assembly of a phased diploid Candida albicans genome facilitates allele-specific measurements and provides a simple model for repeat and indel structure. Genome Biol. 2013, 14: R97-10.1186/gb-2013-14-9-r97.
Forche A, Alby K, Schaefer D, Johnson AD, Berman J, Bennett RJ: The parasexual cycle in Candida albicans provides an alternative pathway to meiosis for the formation of recombinant strains. PLoS Biol. 2008, 6: e110-10.1371/journal.pbio.0060110.
Kozinn PJ, Taschdjian CL, Burchall JJ, Wiener H: Transmission of P32-Labeled Candida Albicans to Newborn Mice at Birth. AMA Am J Dis Child. 1960, 99: 31-34.
Marr KA, White TC, van Burik JA, Bowden RA: Development of fluconazole resistance in Candida albicans causing disseminated infection in a patient undergoing marrow transplantation. Clin Infect Dis. 1997, 25: 908-910. 10.1086/515553.
Arnaud MB, Costanzo MC, Skrzypek MS, Binkley G, Lane C, Miyasato SR, Sherlock G: The Candida Genome Database (CGD), a community resource for Candida albicans gene and protein information. Nucleic Acids Res. 2005, 33: D358-D363. 10.1093/nar/gki003.
The MIT license (MIT) at The Open Source website. [], [http://opensource.org/licenses/MIT]