<i>Stacks</i>: Building and Genotyping Loci <i>De Novo</i> From Short-Read Sequences

G3: Genes, Genomes, Genetics - Tập 1 Số 3 - Trang 171-182 - 2011
Julian Catchen1, Angel Amores2, Paul A. Hohenlohe1, William A. Cresko1, John H. Postlethwait2
1Center for Ecology and Evolutionary Biology
2Institute of Neuroscience, University of Oregon, Eugene, Oregon, 97403

Tóm tắt

Abstract Advances in sequencing technology provide special opportunities for genotyping individuals with speed and thrift, but the lack of software to automate the calling of tens of thousands of genotypes over hundreds of individuals has hindered progress. Stacks is a software system that uses short-read sequence data to identify and genotype loci in a set of individuals either de novo or by comparison to a reference genome. From reduced representation Illumina sequence data, such as RAD-tags, Stacks can recover thousands of single nucleotide polymorphism (SNP) markers useful for the genetic analysis of crosses or populations. Stacks can generate markers for ultra-dense genetic linkage maps, facilitate the examination of population phylogeography, and help in reference genome assembly. We report here the algorithms implemented in Stacks and demonstrate their efficacy by constructing loci from simulated RAD-tags taken from the stickleback reference genome and by recapitulating and improving a genetic map of the zebrafish, Danio rerio.

Từ khóa


Tài liệu tham khảo

Allendorf, 1997, Secondary tetrasomic segregation of MDH-B and preferential pairing of homeologues in rainbow trout., Genetics, 145, 1083, 10.1093/genetics/145.4.1083

Altschul, 1997, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs., Nucleic Acids Res., 25, 3389, 10.1093/nar/25.17.3389

Amores, 1998, Zebrafish hox clusters and vertebrate genome evolution., Science, 282, 1711, 10.1126/science.282.5394.1711

Amores, 2011, Genome evolution and meiotic maps by massively parallel DNA sequencing: spotted gar, an outgroup for the teleost genome duplication., Genetics, 188, 799, 10.1534/genetics.111.127324

Arias, 2009, A high density linkage map of the bovine genome., BMC Genet., 10, 18, 10.1186/1471-2156-10-18

Baird, 2008, Rapid SNP discovery and genetic mapping using sequenced RAD markers., PLoS ONE, 3, e3376, 10.1371/journal.pone.0003376

Broman, 2003, R/qtl: QTL mapping in experimental crosses., Bioinformatics, 19, 889, 10.1093/bioinformatics/btg112

de Hoon M J L , 2010 The C Clustering Library for cDNA microarray data. Available at: http://bonsai.hgc.jp/∼mdehoon/software/cluster/software.htm#source.

Dehal, 2005, Two rounds of whole genome duplication in the ancestral vertebrate., PLoS Biol., 3, e314, 10.1371/journal.pbio.0030314

Edgar, 2004, Local homology recognition and distance measures in linear time using compressed amino acid alphabets., Nucleic Acids Res., 32, 380, 10.1093/nar/gkh180

Emerson, 2010, Resolving postglacial phylogeography using high-throughput sequencing., Proc. Natl. Acad. Sci. U S A, 107, 16196, 10.1073/pnas.1006538107

Etter, 2011, Local de novo assembly of RAD paired-end contigs using short sequencing reads., PLoS ONE, 6, e18561, 10.1371/journal.pone.0018561

Ewing, 1998, Base-calling of automated sequencer traces using Phred. II. Error probabilities., Genome Res., 8, 186, 10.1101/gr.8.3.186

Hohenlohe, 2011, Next-generation RAD sequencing identifies thousands of SNPs for assessing hybridization between rainbow and westslope cutthroat trout., Molecular Ecology Resources, 11, 117, 10.1111/j.1755-0998.2010.02967.x

Hohenlohe, 2010, Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags., PLoS Genet., 6, e1000862, 10.1371/journal.pgen.1000862

Jiao, 2011, Ancestral polyploidy in seed plants and angiosperms., Nature, 473, 97, 10.1038/nature09916

Kelley, 2010, Quake: quality-aware detection and correction of sequencing errors., Genome Biol., 11, R116, 10.1186/gb-2010-11-11-r116

Kelly, 2000, Genetic linkage mapping of zebrafish genes and ESTs., Genome Res., 10, 558, 10.1101/gr.10.4.558

Koop, 2008, A salmonid EST genomic study: genes, duplications, phylogeny and microarrays., BMC Genomics, 9, 545, 10.1186/1471-2164-9-545

Langmead, 2009, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome., Genome Biol., 10, R25, 10.1186/gb-2009-10-3-r25

Li, 2009, The Sequence Alignment/Map format and SAMtools., Bioinformatics, 25, 2078, 10.1093/bioinformatics/btp352

McDaniel, 2007, A linkage map reveals a complex basis for segregation distortion in an interpopulation cross in the moss Ceratodon purpureus., Genetics, 176, 2489, 10.1534/genetics.107.075424

Miller, 2007, Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers., Genome Res., 17, 240, 10.1101/gr.5681207

Mortazavi, 2008, Mapping and quantifying mammalian transcriptomes by RNA-Seq., Nat. Methods, 5, 621, 10.1038/nmeth.1226

Nechiporuk, 1999, Assessment of polymorphism in zebrafish mapping strains., Genome Res., 9, 1231, 10.1101/gr.9.12.1231

Phillips, 2006, Assignment of zebrafish genetic linkage groups to chromosomes., Cytogenet. Genome Res., 114, 155, 10.1159/000093332

Postlethwait, 1994, A genetic linkage map for the zebrafish., Science, 264, 699, 10.1126/science.8171321

Shimoda, 1999, Zebrafish genetic map with 2000 microsatellite markers., Genomics, 58, 219, 10.1006/geno.1999.5824

Snyder, 2010, Personal genome sequencing: current approaches and challenges., Genes Dev., 24, 423, 10.1101/gad.1864110

Streisinger, 1986, Segregation analyses and gene-centromere distances in zebrafish., Genetics, 112, 311, 10.1093/genetics/112.2.311

Sturtevant, 1913, The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association., J. Exp. Zool., 14, 43, 10.1002/jez.1400140104

Sun, 2007, An ultradense genetic recombination map for Brassica napus, consisting of 13551 SRAP markers., TAG Theoretical and Applied Genetics, 114, 1305, 10.1007/s00122-006-0483-z

Van Ooijen, 2006, JoinMap 4.0: Software for the Calculation of Genetic Linkage Maps in Experimental Populations

van Os, 2006, Construction of a 10,000-marker ultradense genetic recombination map of potato: providing a framework for accelerated gene isolation and a genomewide physical map., Genetics, 173, 1075, 10.1534/genetics.106.055871

Vinga, 2003, Alignment-free sequence comparison - a review., Bioinformatics, 19, 513, 10.1093/bioinformatics/btg005

Woods, 2000, A comparative map of the zebrafish genome., Genome Res., 10, 1903, 10.1101/gr.10.12.1903

Woods, 2005, The zebrafish gene map defines ancestral vertebrate chromosomes., Genome Res., 15, 1307, 10.1101/gr.4134305

Zerbino, 2008, Velvet: algorithms for de novo short read assembly using de Bruijn graphs., Genome Res., 18, 821, 10.1101/gr.074492.107