Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences
Tóm tắt
Advances in sequencing technology provide special opportunities for genotyping individuals with speed and thrift, but the lack of software to automate the calling of tens of thousands of genotypes over hundreds of individuals has hindered progress. Stacks is a software system that uses short-read sequence data to identify and genotype loci in a set of individuals either de novo or by comparison to a reference genome. From reduced representation Illumina sequence data, such as RAD-tags, Stacks can recover thousands of single nucleotide polymorphism (SNP) markers useful for the genetic analysis of crosses or populations. Stacks can generate markers for ultra-dense genetic linkage maps, facilitate the examination of population phylogeography, and help in reference genome assembly. We report here the algorithms implemented in Stacks and demonstrate their efficacy by constructing loci from simulated RAD-tags taken from the stickleback reference genome and by recapitulating and improving a genetic map of the zebrafish, Danio rerio.
Từ khóa
Tài liệu tham khảo
Allendorf, 1997, Secondary tetrasomic segregation of MDH-B and preferential pairing of homeologues in rainbow trout., Genetics, 145, 1083, 10.1093/genetics/145.4.1083
Altschul, 1997, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs., Nucleic Acids Res., 25, 3389, 10.1093/nar/25.17.3389
Amores, 1998, Zebrafish hox clusters and vertebrate genome evolution., Science, 282, 1711, 10.1126/science.282.5394.1711
Amores, 2011, Genome evolution and meiotic maps by massively parallel DNA sequencing: spotted gar, an outgroup for the teleost genome duplication., Genetics, 188, 799, 10.1534/genetics.111.127324
Arias, 2009, A high density linkage map of the bovine genome., BMC Genet., 10, 18, 10.1186/1471-2156-10-18
Baird, 2008, Rapid SNP discovery and genetic mapping using sequenced RAD markers., PLoS ONE, 3, e3376, 10.1371/journal.pone.0003376
Broman, 2003, R/qtl: QTL mapping in experimental crosses., Bioinformatics, 19, 889, 10.1093/bioinformatics/btg112
de Hoon M J L , 2010 The C Clustering Library for cDNA microarray data. Available at: http://bonsai.hgc.jp/∼mdehoon/software/cluster/software.htm#source.
Dehal, 2005, Two rounds of whole genome duplication in the ancestral vertebrate., PLoS Biol., 3, e314, 10.1371/journal.pbio.0030314
Edgar, 2004, Local homology recognition and distance measures in linear time using compressed amino acid alphabets., Nucleic Acids Res., 32, 380, 10.1093/nar/gkh180
Emerson, 2010, Resolving postglacial phylogeography using high-throughput sequencing., Proc. Natl. Acad. Sci. U S A, 107, 16196, 10.1073/pnas.1006538107
Etter, 2011, Local de novo assembly of RAD paired-end contigs using short sequencing reads., PLoS ONE, 6, e18561, 10.1371/journal.pone.0018561
Ewing, 1998, Base-calling of automated sequencer traces using Phred. II. Error probabilities., Genome Res., 8, 186, 10.1101/gr.8.3.186
Hohenlohe, 2011, Next-generation RAD sequencing identifies thousands of SNPs for assessing hybridization between rainbow and westslope cutthroat trout., Molecular Ecology Resources, 11, 117, 10.1111/j.1755-0998.2010.02967.x
Hohenlohe, 2010, Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags., PLoS Genet., 6, e1000862, 10.1371/journal.pgen.1000862
Jiao, 2011, Ancestral polyploidy in seed plants and angiosperms., Nature, 473, 97, 10.1038/nature09916
Kelley, 2010, Quake: quality-aware detection and correction of sequencing errors., Genome Biol., 11, R116, 10.1186/gb-2010-11-11-r116
Kelly, 2000, Genetic linkage mapping of zebrafish genes and ESTs., Genome Res., 10, 558, 10.1101/gr.10.4.558
Koop, 2008, A salmonid EST genomic study: genes, duplications, phylogeny and microarrays., BMC Genomics, 9, 545, 10.1186/1471-2164-9-545
Langmead, 2009, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome., Genome Biol., 10, R25, 10.1186/gb-2009-10-3-r25
Li, 2009, The Sequence Alignment/Map format and SAMtools., Bioinformatics, 25, 2078, 10.1093/bioinformatics/btp352
McDaniel, 2007, A linkage map reveals a complex basis for segregation distortion in an interpopulation cross in the moss Ceratodon purpureus., Genetics, 176, 2489, 10.1534/genetics.107.075424
Miller, 2007, Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers., Genome Res., 17, 240, 10.1101/gr.5681207
Mortazavi, 2008, Mapping and quantifying mammalian transcriptomes by RNA-Seq., Nat. Methods, 5, 621, 10.1038/nmeth.1226
Nechiporuk, 1999, Assessment of polymorphism in zebrafish mapping strains., Genome Res., 9, 1231, 10.1101/gr.9.12.1231
Phillips, 2006, Assignment of zebrafish genetic linkage groups to chromosomes., Cytogenet. Genome Res., 114, 155, 10.1159/000093332
Postlethwait, 1994, A genetic linkage map for the zebrafish., Science, 264, 699, 10.1126/science.8171321
Shimoda, 1999, Zebrafish genetic map with 2000 microsatellite markers., Genomics, 58, 219, 10.1006/geno.1999.5824
Snyder, 2010, Personal genome sequencing: current approaches and challenges., Genes Dev., 24, 423, 10.1101/gad.1864110
Streisinger, 1986, Segregation analyses and gene-centromere distances in zebrafish., Genetics, 112, 311, 10.1093/genetics/112.2.311
Sturtevant, 1913, The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association., J. Exp. Zool., 14, 43, 10.1002/jez.1400140104
Sun, 2007, An ultradense genetic recombination map for Brassica napus, consisting of 13551 SRAP markers., TAG Theoretical and Applied Genetics, 114, 1305, 10.1007/s00122-006-0483-z
Van Ooijen, 2006, JoinMap 4.0: Software for the Calculation of Genetic Linkage Maps in Experimental Populations
van Os, 2006, Construction of a 10,000-marker ultradense genetic recombination map of potato: providing a framework for accelerated gene isolation and a genomewide physical map., Genetics, 173, 1075, 10.1534/genetics.106.055871
Vinga, 2003, Alignment-free sequence comparison - a review., Bioinformatics, 19, 513, 10.1093/bioinformatics/btg005
Woods, 2000, A comparative map of the zebrafish genome., Genome Res., 10, 1903, 10.1101/gr.10.12.1903
Woods, 2005, The zebrafish gene map defines ancestral vertebrate chromosomes., Genome Res., 15, 1307, 10.1101/gr.4134305