A New Chicken Genome Assembly Provides Insight into Avian Genome Structure
Tóm tắt
The importance of the Gallus gallus (chicken) as a model organism and agricultural animal merits a continuation of sequence assembly improvement efforts. We present a new version of the chicken genome assembly (Gallus_gallus-5.0; GCA_000002315.3), built from combined long single molecule sequencing technology, finished BACs, and improved physical maps. In overall assembled bases, we see a gain of 183 Mb, including 16.4 Mb in placed chromosomes with a corresponding gain in the percentage of intact repeat elements characterized. Of the 1.21 Gb genome, we include three previously missing autosomes, GGA30, 31, and 33, and improve sequence contig length 10-fold over the previous Gallus_gallus-4.0. Despite the significant base representation improvements made, 138 Mb of sequence is not yet located to chromosomes. When annotated for gene content, Gallus_gallus-5.0 shows an increase of 4679 annotated genes (2768 noncoding and 1911 protein-coding) over those in Gallus_gallus-4.0. We also revisited the question of what genes are missing in the avian lineage, as assessed by the highest quality avian genome assembly to date, and found that a large fraction of the original set of missing genes are still absent in sequenced bird species. Finally, our new data support a detailed map of MHC-B, encompassing two segments: one with a highly stable gene copy number and another in which the gene copy number is highly variable. The chicken model has been a critical resource for many other fields of study, and this new reference assembly will substantially further these efforts.
Từ khóa
Tài liệu tham khảo
Afanassieff, 2001, At least one class I gene in restriction fragment pattern-Y (Rfp-Y), the second MHC gene cluster in the chicken, is transcribed, polymorphic, and shows divergent specialization in antigen binding region., J. Immunol., 166, 3324, 10.4049/jimmunol.166.5.3324
Bellott, 2010, Convergent evolution of chicken Z and human X chromosomes by expansion and gene acquisition., Nature, 466, 612, 10.1038/nature09172
Berlin, 2014, Assembling large genomes with single-molecule sequencing and locality sensitive hashing., bioRxiv, 1, 1
Boetzer, 2014, SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information., BMC Bioinformatics, 15, 211, 10.1186/1471-2105-15-211
Boutet, 2016, UniProtKB/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: how to use the entry view., Methods Mol. Biol., 1374, 23, 10.1007/978-1-4939-3167-5_2
Brown, 2015, Gene: a gene-centered information resource at NCBI., Nucleic Acids Res., 43, D36, 10.1093/nar/gku1055
Chaisson, 2015, Resolving the complexity of the human genome using single-molecule sequencing., Nature, 517, 608, 10.1038/nature13907
Chen, 2015, High speed BLASTN: an accelerated MegaBLAST search tool., Nucleic Acids Res., 43, 7762, 10.1093/nar/gkv784
Cheng, 1995, Development of a genetic map of the chicken with markers of high utility., Poult. Sci., 74, 1855, 10.3382/ps.0741855
Ellinghaus, 2008, LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons., BMC Bioinformatics, 9, 18, 10.1186/1471-2105-9-18
Fulton, 2016, A high-density SNP panel reveals extensive diversity, frequent recombination and multiple recombination hotspots within the chicken major histocompatibility complex B region between BG2 and CD1A1., Genet. Sel. Evol., 48, 1, 10.1186/s12711-015-0181-x
Gordon, 2007, Comparative analysis of chicken chromosome 28 provides new clues to the evolutionary fragility of gene-rich vertebrate regions., Genome Res., 17, 1603, 10.1101/gr.6775107
Gotz, 2008, High-throughput functional annotation and data mining with the Blast2GO suite., Nucleic Acids Res., 36, 3420, 10.1093/nar/gkn176
Groenen, 2009, A high-density SNP-based linkage map of the chicken genome reveals sequence features correlated with recombination rate., Genome Res., 19, 510, 10.1101/gr.086538.108
Huntley, 2015, The GOA database: gene ontology annotation updates for 2015., Nucleic Acids Res., 43, D1057, 10.1093/nar/gku1113
onsortium, 2004, Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution., Nature, 432, 695, 10.1038/nature03154
Jones, 2014, InterProScan 5: genome-scale protein function classification., Bioinformatics, 30, 1236, 10.1093/bioinformatics/btu031
Kapustin, 2008, Splign: algorithms for computing spliced alignments with identification of paralogs., Biol. Direct, 3, 20, 10.1186/1745-6150-3-20
Koboldt, 2013, Using VarScan 2 for germline variant calling and somatic mutation detection., Curr. Protoc. Bioinformatics, 44, 15 14 1, 10.1002/0471250953.bi1504s44
Koren, 2012, Hybrid error correction and de novo assembly of single-molecule sequencing reads., Nat. Biotechnol., 30, 693, 10.1038/nbt.2280
Kurtz, 2004, Versatile and open software for comparing large genomes., Genome Biol., 5, R12, 10.1186/gb-2004-5-2-r12
Laun, 2006, The leukocyte receptor complex in chicken is characterized by massive expansion and diversification of immunoglobulin-like loci., PLoS Genet., 2, e73, 10.1371/journal.pgen.0020073
Li, 2009, Fast and accurate short read alignment with Burrows-Wheeler transform., Bioinformatics, 25, 1754, 10.1093/bioinformatics/btp324
Lovell, 2014, Conserved syntenic clusters of protein coding genes are missing in birds., Genome Biol., 15, 565, 10.1186/s13059-014-0565-1
Manly, 2001, Map manager QTX, cross-platform software for genetic mapping., Mamm. Genome, 12, 930, 10.1007/s00335-001-1016-3
Mason, 2016, A new look at the LTR retrotransposon content of the chicken genome., BMC Genomics, 17, 688, 10.1186/s12864-016-3043-1
McCarthy, 2003, LTR_STRUC: a novel search and identification program for LTR retrotransposons., Bioinformatics, 19, 362, 10.1093/bioinformatics/btf878
Miller, 2016, Brief review of the chicken major histocompatibility complex: the genes, their distribution on chromosome 16, and their contributions to disease resistance., Poult. Sci., 95, 375, 10.3382/ps/pev379
Mitchell, 2015, The InterPro protein families database: the classification resource after 15 years., Nucleic Acids Res., 43, D213, 10.1093/nar/gku1243
Montague, 2014, Comparative analysis of the domestic cat genome reveals genetic signatures underlying feline biology and domestication., Proc. Natl. Acad. Sci. USA, 111, 17230, 10.1073/pnas.1410083111
Morgulis, 2006, WindowMasker: window-based masker for sequenced genomes., Bioinformatics, 22, 134, 10.1093/bioinformatics/bti774
Myhre, 2006, Additional gene ontology structure for improved biological reasoning., Bioinformatics, 22, 2020, 10.1093/bioinformatics/btl334
Putnam, 2016, Chromosome-scale shotgun assembly using an in vitro method for long-range linkage., Genome Res., 26, 342, 10.1101/gr.193474.115
Qanbari, 2015, Parallel selection revealed by population sequencing in chicken., Genome Biol. Evol., 7, 3299, 10.1093/gbe/evv222
Reyer, 2015, The genetics of feed conversion efficiency traits in a commercial broiler line., Sci. Rep., 5, 16387, 10.1038/srep16387
Rho, 2007, De novo identification of LTR retrotransposons in eukaryotic genomes., BMC Genomics, 8, 90, 10.1186/1471-2164-8-90
Rubin, 2010, Whole-genome resequencing reveals loci under selection during chicken domestication., Nature, 464, 587, 10.1038/nature08832
Salomonsen, 2014, Sequence of a complete chicken BG haplotype shows dynamic expansion and contraction of two gene lineages with particular expression patterns., PLoS Genet., 10, e1004417, 10.1371/journal.pgen.1004417
Schmid, 2015, Third report on chicken genes and chromosomes 2015., Cytogenet. Genome Res., 145, 78, 10.1159/000430927
Schneider, 2013, Clone DB: an integrated NCBI resource for clone-associated data., Nucleic Acids Res., 41, D1070, 10.1093/nar/gks1164
Shiina, 2007, Extended gene map reveals tripartite motif, C-type lectin, and Ig superfamily type genes within a subregion of the chicken MHC-B affecting infectious disease., J. Immunol., 178, 7162, 10.4049/jimmunol.178.11.7162
Smit, A. H. R., and P. Green, 2013 RepeatMasker. Available at: http://repeatmasker.org/. Accessed: November 17, 2016.
Tsai, 2010, Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps., Genome Biol., 11, R41, 10.1186/gb-2010-11-4-r41
Venturini, 1986, Size and structure of the bird genome–I. DNA content of 48 species of Neognathae., Comp. Biochem. Physiol. B, 85, 61, 10.1016/0305-0491(86)90221-X
Viertlboeck, 2005, The chicken leukocyte receptor complex: a highly diverse multigene family encoding at least six structurally distinct receptor types., J. Immunol., 175, 385, 10.4049/jimmunol.175.1.385
Wu, 2016, GMAP and GSNAP for genomic sequence alignment: enhancements to speed, accuracy, and functionality., Methods Mol. Biol., 1418, 283, 10.1007/978-1-4939-3578-9_15
Yao, 2012, Graph accordance of next-generation sequence assemblies., Bioinformatics, 28, 13, 10.1093/bioinformatics/btr588