A New Chicken Genome Assembly Provides Insight into Avian Genome Structure

G3: Genes, Genomes, Genetics - Tập 7 Số 1 - Trang 109-117 - 2017
Wesley C. Warren1, LaDeana W. Hillier1, Chad Tomlinson1, Patrick Minx1, Milinn Kremitzki1, Tina Graves1, Chris Markovic1, Nathan Bouk2,3, Kim D. Pruitt2,3, Françoise Thibaud‐Nissen2,3, Valérie Schneider2,3, Tamer Mansour4, C. Titus Brown4, Aleksey V. Zimin5, Rachel Hawken6, Mitch Abrahamsen6, Alexis Black Pyrkosz7, Mireille Morisson8, Valérie Fillon8, Alain Vignal8, William Chow9, Kerstin Howe9, Janet E. Fulton10, Marcia M. Miller11, Peter V. Lovell12, Claudio V. Mello12, Morgan Wirthlin12, Andrew S. Mason13, Richard Kuo13, David W. Burt13, Jerry B. Dodgson14, Hans H. Cheng7
1McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63108
2National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
3Natl Lib Med, Natl Ctr Biotechnol Informat (United States)
4UC Davis - University of California [Davis] (One Shields Avenue, , Davis, CA 95616-5294 - United States)
5Institute for Physical Sciences and Technology, University of Maryland, College Park, Maryland 20742
6Cobb-Vantress, Inc., Siloam Springs, Arkansas 72761-1030
7United States Department of Agriculture-Agricultural Research Service, Avian Disease and Oncology, East Lansing, Michigan 48823
8Génétique Physiologie et Systèmes d’Elevage, Université de Toulouse, Institut National de la Recherche Agronomique, Auzeville Castanet Tolosan, France
9The Wellcome Trust Sanger Institute [Cambridge] (Hinxton, Cambridge CB10 1SA, UK - United Kingdom)
10Hy-Line International, Dallas, Iowa 50063
11Beckman Research Institute, Duarte, California 91010-3000
12OHSU - Oregon Health and Science University [Portland] (3181 S.W. Sam Jackson Park Rd. Portland, Oregon 97239-3098 - United States)
13The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian EH25 9RG, United Kingdom
14Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan 48824

Tóm tắt

Abstract

The importance of the Gallus gallus (chicken) as a model organism and agricultural animal merits a continuation of sequence assembly improvement efforts. We present a new version of the chicken genome assembly (Gallus_gallus-5.0; GCA_000002315.3), built from combined long single molecule sequencing technology, finished BACs, and improved physical maps. In overall assembled bases, we see a gain of 183 Mb, including 16.4 Mb in placed chromosomes with a corresponding gain in the percentage of intact repeat elements characterized. Of the 1.21 Gb genome, we include three previously missing autosomes, GGA30, 31, and 33, and improve sequence contig length 10-fold over the previous Gallus_gallus-4.0. Despite the significant base representation improvements made, 138 Mb of sequence is not yet located to chromosomes. When annotated for gene content, Gallus_gallus-5.0 shows an increase of 4679 annotated genes (2768 noncoding and 1911 protein-coding) over those in Gallus_gallus-4.0. We also revisited the question of what genes are missing in the avian lineage, as assessed by the highest quality avian genome assembly to date, and found that a large fraction of the original set of missing genes are still absent in sequenced bird species. Finally, our new data support a detailed map of MHC-B, encompassing two segments: one with a highly stable gene copy number and another in which the gene copy number is highly variable. The chicken model has been a critical resource for many other fields of study, and this new reference assembly will substantially further these efforts.

Từ khóa


Tài liệu tham khảo

Afanassieff, 2001, At least one class I gene in restriction fragment pattern-Y (Rfp-Y), the second MHC gene cluster in the chicken, is transcribed, polymorphic, and shows divergent specialization in antigen binding region., J. Immunol., 166, 3324, 10.4049/jimmunol.166.5.3324

Bellott, 2010, Convergent evolution of chicken Z and human X chromosomes by expansion and gene acquisition., Nature, 466, 612, 10.1038/nature09172

Berlin, 2014, Assembling large genomes with single-molecule sequencing and locality sensitive hashing., bioRxiv, 1, 1

Boetzer, 2014, SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information., BMC Bioinformatics, 15, 211, 10.1186/1471-2105-15-211

Boutet, 2016, UniProtKB/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: how to use the entry view., Methods Mol. Biol., 1374, 23, 10.1007/978-1-4939-3167-5_2

Brown, 2015, Gene: a gene-centered information resource at NCBI., Nucleic Acids Res., 43, D36, 10.1093/nar/gku1055

Chaisson, 2015, Resolving the complexity of the human genome using single-molecule sequencing., Nature, 517, 608, 10.1038/nature13907

Chen, 2015, High speed BLASTN: an accelerated MegaBLAST search tool., Nucleic Acids Res., 43, 7762, 10.1093/nar/gkv784

Cheng, 1995, Development of a genetic map of the chicken with markers of high utility., Poult. Sci., 74, 1855, 10.3382/ps.0741855

Ellinghaus, 2008, LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons., BMC Bioinformatics, 9, 18, 10.1186/1471-2105-9-18

Fulton, 2016, A high-density SNP panel reveals extensive diversity, frequent recombination and multiple recombination hotspots within the chicken major histocompatibility complex B region between BG2 and CD1A1., Genet. Sel. Evol., 48, 1, 10.1186/s12711-015-0181-x

Gordon, 2007, Comparative analysis of chicken chromosome 28 provides new clues to the evolutionary fragility of gene-rich vertebrate regions., Genome Res., 17, 1603, 10.1101/gr.6775107

Gotz, 2008, High-throughput functional annotation and data mining with the Blast2GO suite., Nucleic Acids Res., 36, 3420, 10.1093/nar/gkn176

Groenen, 2009, A high-density SNP-based linkage map of the chicken genome reveals sequence features correlated with recombination rate., Genome Res., 19, 510, 10.1101/gr.086538.108

Huntley, 2015, The GOA database: gene ontology annotation updates for 2015., Nucleic Acids Res., 43, D1057, 10.1093/nar/gku1113

onsortium, 2004, Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution., Nature, 432, 695, 10.1038/nature03154

Johnson, 2008, NCBI BLAST: a better web interface., Nucleic Acids Res., 36, W5, 10.1093/nar/gkn201

Jones, 2014, InterProScan 5: genome-scale protein function classification., Bioinformatics, 30, 1236, 10.1093/bioinformatics/btu031

Kapustin, 2008, Splign: algorithms for computing spliced alignments with identification of paralogs., Biol. Direct, 3, 20, 10.1186/1745-6150-3-20

Kent, 2002, BLAT–the BLAST-like alignment tool., Genome Res., 12, 656, 10.1101/gr.229202

Koboldt, 2013, Using VarScan 2 for germline variant calling and somatic mutation detection., Curr. Protoc. Bioinformatics, 44, 15 14 1, 10.1002/0471250953.bi1504s44

Koren, 2012, Hybrid error correction and de novo assembly of single-molecule sequencing reads., Nat. Biotechnol., 30, 693, 10.1038/nbt.2280

Kurtz, 2004, Versatile and open software for comparing large genomes., Genome Biol., 5, R12, 10.1186/gb-2004-5-2-r12

Laun, 2006, The leukocyte receptor complex in chicken is characterized by massive expansion and diversification of immunoglobulin-like loci., PLoS Genet., 2, e73, 10.1371/journal.pgen.0020073

Li, 2009, Fast and accurate short read alignment with Burrows-Wheeler transform., Bioinformatics, 25, 1754, 10.1093/bioinformatics/btp324

Lovell, 2014, Conserved syntenic clusters of protein coding genes are missing in birds., Genome Biol., 15, 565, 10.1186/s13059-014-0565-1

Manly, 2001, Map manager QTX, cross-platform software for genetic mapping., Mamm. Genome, 12, 930, 10.1007/s00335-001-1016-3

Mason, 2016, A new look at the LTR retrotransposon content of the chicken genome., BMC Genomics, 17, 688, 10.1186/s12864-016-3043-1

McCarthy, 2003, LTR_STRUC: a novel search and identification program for LTR retrotransposons., Bioinformatics, 19, 362, 10.1093/bioinformatics/btf878

Miller, 2016, Brief review of the chicken major histocompatibility complex: the genes, their distribution on chromosome 16, and their contributions to disease resistance., Poult. Sci., 95, 375, 10.3382/ps/pev379

Mitchell, 2015, The InterPro protein families database: the classification resource after 15 years., Nucleic Acids Res., 43, D213, 10.1093/nar/gku1243

Montague, 2014, Comparative analysis of the domestic cat genome reveals genetic signatures underlying feline biology and domestication., Proc. Natl. Acad. Sci. USA, 111, 17230, 10.1073/pnas.1410083111

Morgulis, 2006, WindowMasker: window-based masker for sequenced genomes., Bioinformatics, 22, 134, 10.1093/bioinformatics/bti774

Myhre, 2006, Additional gene ontology structure for improved biological reasoning., Bioinformatics, 22, 2020, 10.1093/bioinformatics/btl334

Putnam, 2016, Chromosome-scale shotgun assembly using an in vitro method for long-range linkage., Genome Res., 26, 342, 10.1101/gr.193474.115

Qanbari, 2015, Parallel selection revealed by population sequencing in chicken., Genome Biol. Evol., 7, 3299, 10.1093/gbe/evv222

Reyer, 2015, The genetics of feed conversion efficiency traits in a commercial broiler line., Sci. Rep., 5, 16387, 10.1038/srep16387

Rho, 2007, De novo identification of LTR retrotransposons in eukaryotic genomes., BMC Genomics, 8, 90, 10.1186/1471-2164-8-90

Rubin, 2010, Whole-genome resequencing reveals loci under selection during chicken domestication., Nature, 464, 587, 10.1038/nature08832

Salomonsen, 2014, Sequence of a complete chicken BG haplotype shows dynamic expansion and contraction of two gene lineages with particular expression patterns., PLoS Genet., 10, e1004417, 10.1371/journal.pgen.1004417

Schmid, 2015, Third report on chicken genes and chromosomes 2015., Cytogenet. Genome Res., 145, 78, 10.1159/000430927

Schneider, 2013, Clone DB: an integrated NCBI resource for clone-associated data., Nucleic Acids Res., 41, D1070, 10.1093/nar/gks1164

Shiina, 2007, Extended gene map reveals tripartite motif, C-type lectin, and Ig superfamily type genes within a subregion of the chicken MHC-B affecting infectious disease., J. Immunol., 178, 7162, 10.4049/jimmunol.178.11.7162

Smit, A. H. R., and P. Green, 2013 RepeatMasker. Available at: http://repeatmasker.org/. Accessed: November 17, 2016.

Tsai, 2010, Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps., Genome Biol., 11, R41, 10.1186/gb-2010-11-4-r41

Venturini, 1986, Size and structure of the bird genome–I. DNA content of 48 species of Neognathae., Comp. Biochem. Physiol. B, 85, 61, 10.1016/0305-0491(86)90221-X

Viertlboeck, 2005, The chicken leukocyte receptor complex: a highly diverse multigene family encoding at least six structurally distinct receptor types., J. Immunol., 175, 385, 10.4049/jimmunol.175.1.385

Wallis, 2004, A physical map of the chicken genome., Nature, 432, 761, 10.1038/nature03030

Wu, 2016, GMAP and GSNAP for genomic sequence alignment: enhancements to speed, accuracy, and functionality., Methods Mol. Biol., 1418, 283, 10.1007/978-1-4939-3578-9_15

Yao, 2012, Graph accordance of next-generation sequence assemblies., Bioinformatics, 28, 13, 10.1093/bioinformatics/btr588

Zhang, 2015, Genomics: bird sequencing project takes off., Nature, 522, 34, 10.1038/522034d

Zhang, 2014, Assembly errors cause false tandem duplicate regions in the chicken (Gallus gallus) genome sequence., Chromosoma, 123, 165, 10.1007/s00412-013-0443-8