AgAnimalGenomes: browsers for viewing and manually annotating farm animal genomes

Springer Science and Business Media LLC - Tập 34 - Trang 418-436 - 2023
Deborah A. Triant1, Amy T. Walsh1, Gabrielle A. Hartley2, Bruna Petry3, Morgan R. Stegemiller4, Benjamin M. Nelson1, Makenna M. McKendrick5, Emily P. Fuller2, Noelle E. Cockett6, James E. Koltes3, Stephanie D. McKay7, Jonathan A. Green1, Brenda M. Murdoch4, Darren E. Hagen5, Christine G. Elsik1,8,9
1Division of Animal Sciences, University of Missouri, Columbia, USA
2Department of Molecular and Cell Biology, University of Connecticut, Storrs, USA
3Department of Animal Science, Iowa State University, Ames, USA
4Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, USA
5Department of Animal and Food Sciences, Oklahoma State University, Stillwater, USA
6Department of Animal, Dairy, and Veterinary Sciences, Utah State University, Logan, USA
7Department of Animal and Veterinary Sciences, University of Vermont, Burlington, USA
8Division of Plant Science & Technology, University of Missouri, Columbia, USA
9Institute for Data Science & Informatics, University of Missouri, Columbia, USA

Tóm tắt

Current genome sequencing technologies have made it possible to generate highly contiguous genome assemblies for non-model animal species. Despite advances in genome assembly methods, there is still room for improvement in the delineation of specific gene features in the genomes. Here we present genome visualization and annotation tools to support seven livestock species (bovine, chicken, goat, horse, pig, sheep, and water buffalo), available in a new resource called AgAnimalGenomes. In addition to supporting the manual refinement of gene models, these browsers provide visualization tracks for hundreds of RNAseq experiments, as well as data generated by the Functional Annotation of Animal Genomes (FAANG) Consortium. For species with predicted gene sets from both Ensembl and RefSeq, the browsers provide special tracks showing the thousands of protein-coding genes that disagree across the two gene sources, serving as a valuable resource to alert researchers to gene model issues that may affect data interpretation. We describe the data and search methods available in the new genome browsers and how to use the provided tools to edit and create new gene models.

Tài liệu tham khảo

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410 Andersson L, Archibald AL, Bottema CD, Brauning R, Burgess SC, Burt DW, Casas E, Cheng HH, Clarke L, Couldrey C, Dalrymple BP, Elsik CG, Foissac S, Giuffra E, Groenen MA, Hayes BJ, Huang LS, Khatib H, Kijas JW, Kim H, Lunney JK, McCarthy FM, McEwan JC, Moore S, Nanduri B, Notredame C, Palti Y, Plastow GS, Reecy JM, Rohrer GA, Sarropoulou E, Schmidt CJ, Silverstein J, Tellam RL, Tixier-Boichard M, Tosser-Klopp G, Tuggle CK, Vilkki J, White SN, Zhao S, Zhou H, Consortium F (2015) Coordinated international action to accelerate genome-to-phenome with FAANG, the Functional Annotation of Animal Genomes project. Genome Biol 16:57 Bickhart DM, Rosen BD, Koren S, Sayre BL, Hastie AR, Chan S, Lee J, Lam ET, Liachko I, Sullivan ST, Burton JN, Huson HJ, Nystrom JC, Kelley CM, Hutchison JL, Zhou Y, Sun J, Crisà A, Ponce de León FA, Schwartz JC, Hammond JA, Waldbieser GC, Schroeder SG, Liu GE, Dunham MJ, Shendure J, Sonstegard TS, Phillippy AM, Van Tassell CP, Smith TP (2017) Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat Genet 49:643–650 Buels R, Yao E, Diesh CM, Hayes RD, Munoz-Torres M, Helt G, Goodstein DM, Elsik CG, Lewis SE, Stein L, Holmes IH (2016) JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol 17:66 Bush SJ, McCulloch MEB, Muriuki C, Salavati M, Davis GM, Farquhar IL, Lisowski ZM, Archibald AL, Hume DA, Clark EL (2019) Comprehensive transcriptional profiling of the gastrointestinal tract of ruminants from birth to adulthood reveals strong developmental stage specific gene expression. G3 (bethesda) 9:359–373 Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva AF, Tsukanov K, Venkataraman S, Flicek P, Parkinson H, Keane TM (2022) The European Variation Archive: a FAIR resource of genomic variation for all species. Nucleic Acids Res 50:D1216–D1220 Chamberlain AJ, Vander Jagt CJ, Hayes BJ, Khansefid M, Marett LC, Millen CA, Nguyen TT, Goddard ME (2015) Extensive variation between tissues in allele specific expression in an outbred mammal. BMC Genom 16:993 Chang A, Jeske L, Ulbrich S, Hofmann J, Koblitz J, Schomburg I, Neumann-Schaal M, Jahn D, Schomburg D (2021) BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic Acids Res 49:D498–D508 Clark EL, Bush SJ, McCulloch MEB, Farquhar IL, Young R, Lefevre L, Pridans C, Tsang HG, Wu C, Afrasiabi C, Watson M, Whitelaw CB, Freeman TC, Summers KM, Archibald AL, Hume DA (2017) A high resolution atlas of gene expression in the domestic sheep (Ovis aries). PLoS Genet 13:e1006997 Cotto KC, Feng YY, Ramu A, Richters M, Freshour SL, Skidmore ZL, Xia H, McMichael JF, Kunisaki J, Campbell KM, Chen TH, Rozycki EB, Adkins D, Devarakonda S, Sankararaman S, Lin Y, Chapman WC, Maher CA, Arora V, Dunn GP, Uppaluri R, Govindan R, Griffith OL, Griffith M (2023) Integrated analysis of genomic and transcriptomic data for the discovery of splice-associated variants in cancer. Nat Commun 14:1589 Courtot M, Gupta D, Liyanage I, Xu F, Burdett T (2022) BioSamples database: FAIRer samples metadata to accelerate research data management. Nucleic Acids Res 50:D1500–D1507 Cunningham F, Allen JE, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, Austine-Orimoloye O, Azov AG, Barnes I, Bennett R, Berry A, Bhai J, Bignell A, Billis K, Boddu S, Brooks L, Charkhchi M, Cummins C, Da Rin Fioretto L, Davidson C, Dodiya K, Donaldson S, El Houdaigui B, El Naboulsi T, Fatima R, Giron CG, Genez T, Martinez JG, Guijarro-Clarke C, Gymer A, Hardy M, Hollis Z, Hourlier T, Hunt T, Juettemann T, Kaikala V, Kay M, Lavidas I, Le T, Lemos D, Marugan JC, Mohanan S, Mushtaq A, Naven M, Ogeh DN, Parker A, Parton A, Perry M, Pilizota I, Prosovetskaia I, Sakthivel MP, Salam AIA, Schmitt BM, Schuilenburg H, Sheppard D, Perez-Silva JG, Stark W, Steed E, Sutinen K, Sukumaran R, Sumathipala D, Suner MM, Szpak M, Thormann A, Tricomi FF, Urbina-Gomez D, Veidenberg A, Walsh TA, Walts B, Willhoft N, Winterbottom A, Wass E, Chakiachvili M, Flint B, Frankish A, Giorgetti S, Haggerty L, Hunt SE, IIsley GR, Loveland JE, Martin FJ, Moore B, Mudge JM, Muffato M, Perry E, Ruffier M, Tate J, Thybert D, Trevanion SJ, Dyer S, Harrison PW, Howe KL, Yates AD, Zerbino DR, Flicek P (2022) Ensembl 2022. Nucleic Acids Res 50:D988–D995 Cox MP, Peterson DA, Biggs PJ (2010) SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data BMC Bioinformatics 11:485 Davenport KM, Bickhart DM, Worley K, Murali SC, Salavati M, Clark EL, Cockett NE, Heaton MP, Smith TPL, Murdoch BM, Rosen BD (2022) An improved ovine reference genome assembly to facilitate in-depth functional annotation of the sheep genome. GigaScience 11:giab096 Derks MFL, Lopes MS, Bosse M, Madsen O, Dibbits B, Harlizius B, Groenen MAM, Megens HJ (2018) Balancing selection on a recessive lethal deletion with pleiotropic effects on two neighboring genes in the porcine genome. PLoS Genet 14:e1007661 Dorji J, Vander Jagt CJ, Garner JB, Marett LC, Mason BA, Reich CM, Xiang R, Clark EL, Cocks BG, Chamberlain AJ, MacLeod IM, Daetwyler HD (2020) Expression of mitochondrial protein genes encoded by nuclear and mitochondrial genomes correlate with energy metabolism in dairy cattle. BMC Genom 21:720 Dunn NA, Unni DR, Diesh C, Munoz-Torres M, Harris NL, Yao E, Rasche H, Holmes IH, Elsik CG, Lewis SE (2019) Apollo: democratizing genome annotation. PLoS Comput Biol 15:e1006790 Foissac S, Djebali S, Munyard K, Vialaneix N, Rau A, Muret K, Esquerre D, Zytnicki M, Derrien T, Bardou P, Blanc F, Cabau C, Crisci E, Dhorne-Pollet S, Drouet F, Faraut T, Gonzalez I, Goubil A, Lacroix-Lamande S, Laurent F, Marthey S, Marti-Marimon M, Momal-Leisenring R, Mompart F, Quere P, Robelin D, Cristobal MS, Tosser-Klopp G, Vincent-Naulleau S, Fabre S, Pinard-Van der Laan MH, Klopp C, Tixier-Boichard M, Acloque H, Lagarrigue S, Giuffra E (2019) Multi-species annotation of transcriptome and chromatin structure in domesticated animals. BMC Biol 17:108 Gao S, Nanaei HA, Wei B, Wang Y, Wang X, Li Z, Dai X, Wang Z, Jiang Y, Shao J (2020) Comparative transcriptome profiling analysis uncovers novel heterosis-related candidate genes associated with muscular endurance in mules. Animals (basel) 10:980 Georges M, Charlier C, Hayes B (2019) Harnessing genomic information for livestock improvement. Nat Rev Genet 20:135–156 Giuffra E, Tuggle CK, Consortium F (2019) Functional Annotation of Animal Genomes (FAANG): current achievements and roadmap. Annu Rev Anim Biosci 7:65–88 Haendel MA, Balhoff JP, Bastian FB, Blackburn DC, Blake JA, Bradford Y, Comte A, Dahdul WM, Dececchi TA, Druzinsky RE, Hayamizu TF, Ibrahim N, Lewis SE, Mabee PM, Niknejad A, Robinson-Rechavi M, Sereno PC, Mungall CJ (2014) Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon. J Biomed Semant 5:21 Harrison PW, Sokolov A, Nayak A, Fan J, Zerbino D, Cochrane G, Flicek P (2021) The FAANG data portal: global, open-access, “FAIR”, and richly validated genotype to phenotype data for high-quality functional annotation of animal genomes. Front Genet 12:639238 Howe KL, Achuthan P, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, Azov AG, Bennett R, Bhai J, Billis K, Boddu S, Charkhchi M, Cummins C, Da Rin Fioretto L, Davidson C, Dodiya K, El Houdaigui B, Fatima R, Gall A, Garcia Giron C, Grego T, Guijarro-Clarke C, Haggerty L, Hemrom A, Hourlier T, Izuogu OG, Juettemann T, Kaikala V, Kay M, Lavidas I, Le T, Lemos D, Gonzalez Martinez J, Marugan JC, Maurel T, McMahon AC, Mohanan S, Moore B, Muffato M, Oheh DN, Paraschas D, Parker A, Parton A, Prosovetskaia I, Sakthivel MP, Salam AIA, Schmitt BM, Schuilenburg H, Sheppard D, Steed E, Szpak M, Szuba M, Taylor K, Thormann A, Threadgold G, Walts B, Winterbottom A, Chakiachvili M, Chaubal A, De Silva N, Flint B, Frankish A, Hunt SE, IIsley GR, Langridge N, Loveland JE, Martin FJ, Mudge JM, Morales J, Perry E, Ruffier M, Tate J, Thybert D, Trevanion SJ, Cunningham F, Yates AD, Zerbino DR, Flicek P (2021) Ensembl 2021. Nucleic Acids Res 49:D884–D891 Hu ZL, Park CA, Reecy JM (2022) Bringing the animal QTLdb and CorrDB into the future: meeting new challenges and providing updated services. Nucleic Acids Res 50:D956–D961 Kalbfleisch TS, Rice ES, DePriest MS Jr, Walenz BP, Hestand MS, Vermeesch JR, O’Connell BL, Fiddes IT, Vershinina AO, Saremi NF, Petersen JL, Finno CJ, Bellone RR, McCue ME, Brooks SA, Bailey E, Orlando L, Green RE, Miller DC, Antczak DF, MacLeod JN (2018) Improved reference genome for the domestic horse increases assembly contiguity and composition. Commun Biol 1:197 Kent WJ (2002) BLAT–the BLAST-like alignment tool. Genome Res 12:656–664 Kern C, Wang Y, Chitwood J, Korf I, Delany M, Cheng H, Medrano JF, Van Eenennaam AL, Ernst C, Ross P, Zhou H (2018) Genome-wide identification of tissue-specific long non-coding RNA in three farm animal species. BMC Genom 19:684 Kern C, Wang Y, Xu X, Pan Z, Halstead M, Chanthavixay G, Saelao P, Waters S, Xiang R, Chamberlain A, Korf I, Delany ME, Cheng HH, Medrano JF, Van Eenennaam AL, Tuggle CK, Ernst C, Flicek P, Quon G, Ross P, Zhou H (2021) Functional annotations of three domestic animal genomes provide vital resources for comparative and agricultural research. Nat Commun 12:1821 Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12:357–360 Kingsley NB, Kern C, Creppe C, Hales EN, Zhou H, Kalbfleisch TS, MacLeod JN, Petersen JL, Finno CJ, Bellone RR (2019) Functionally annotating regulatory elements in the equine genome using histone mark ChIP-seq. Genes (basel) 11:3 Kingsley NB, Hamilton NA, Lindgren G, Orlando L, Bailey E, Brooks S, McCue M, Kalbfleisch TS, MacLeod JN, Petersen JL, Finno CJ, Bellone RR (2021) “Adopt-a-tissue” initiative advances efforts to identify tissue-specific histone marks in the mare. Front Genet 12:649959 Kuo RI, Tseng E, Eory L, Paton IR, Archibald AL, Burt DW (2017) Normalized long read RNA sequencing in chicken reveals transcriptome complexity similar to human. BMC Genom 18:323 Li M, Chen L, Tian S, Lin Y, Tang Q, Zhou X, Li D, Yeung CKL, Che T, Jin L, Fu Y, Ma J, Wang X, Jiang A, Lan J, Pan Q, Liu Y, Luo Z, Guo Z, Liu H, Zhu L, Shuai S, Tang G, Zhao J, Jiang Y, Bai L, Zhang S, Mai M, Li C, Wang D, Gu Y, Wang G, Lu H, Li Y, Zhu H, Li Z, Li M, Gladyshev VN, Jiang Z, Zhao S, Wang J, Li R, Li X (2017) Comprehensive variation discovery and recovery of missing sequence in the pig genome using multiple de novo assemblies. Genome Res 27:865–874 Low WY, Tearle R, Bickhart DM, Rosen BD, Kingan SB, Swale T, Thibaud-Nissen F, Murphy TD, Young R, Lefevre L, Hume DA, Collins A, Ajmone-Marsan P, Smith TPL, Williams JL (2019) Chromosome-level assembly of the water buffalo genome surpasses human and goat genomes in sequence contiguity. Nat Commun 10:260 McCarthy FM, Pendarvis K, Cooksey AM, Gresham CR, Bomhoff M, Davey S, Lyons E, Sonstegard TS, Bridges SM, Burgess SC (2019) Chickspress: a resource for chicken gene expression. Database (oxford) 2019:baz058 Muriuki C, Bush SJ, Salavati M, McCulloch MEB, Lisowski ZM, Agaba M, Djikeng A, Hume DA, Clark EL (2019) A mini-atlas of gene expression for the domestic goat (Capra hircus). Front Genet 10:1080 Navarro Gonzalez J, Zweig AS, Speir ML, Schmelter D, Rosenbloom KR, Raney BJ, Powell CC, Nassar LR, Maulding ND, Lee CM, Lee BT, Hinrichs AS, Fyfe AC, Fernandes JD, Diekhans M, Clawson H, Casper J, Benet-Pages A, Barber GP, Haussler D, Kuhn RM, Haeussler M, Kent WJ (2021) The UCSC genome browser database: 2021 update. Nucleic Acids Res 49:D1046–D1057 O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, Smith-White B, Ako-Adjei D, Astashyn A, Badretdin A, Bao Y, Blinkova O, Brover V, Chetvernin V, Choi J, Cox E, Ermolaeva O, Farrell CM, Goldfarb T, Gupta T, Haft D, Hatcher E, Hlavina W, Joardar VS, Kodali VK, Li W, Maglott D, Masterson P, McGarvey KM, Murphy MR, O’Neill K, Pujar S, Rangwala SH, Rausch D, Riddick LD, Schoch C, Shkeda A, Storz SS, Sun H, Thibaud-Nissen F, Tolstoy I, Tully RE, Vatsan AR, Wallin C, Webb D, Wu W, Landrum MJ, Kimchi A, Tatusova T, DiCuccio M, Kitts P, Murphy TD, Pruitt KD (2016) Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44:D733-745 Pan Z, Yao Y, Yin H, Cai Z, Wang Y, Bai L, Kern C, Halstead M, Chanthavixay G, Trakooljul N, Wimmers K, Sahana G, Su G, Lund MS, Fredholm M, Karlskov-Mortensen P, Ernst CW, Ross P, Tuggle CK, Fang L, Zhou H (2021) Pig genome functional annotation enhances the biological interpretation of complex traits and human disease. Nat Commun 12:5848 Priyam A, Woodcroft BJ, Rai V, Moghul I, Mungala A, Ter F, Chowdhary H, Pieniak IL, Gibbins MA, Moon H, Davis-Richardson A, Uludag M, Watson-Haigh NS, Challis R, Nakamura H, Favreau E, Cifuentes EG, Pluskal T, Leonard G, Rumpf W, Wurm Y (2019) Sequenceserver: a modern graphical user interface for custom BLAST databases. Mol Biol Evol. https://doi.org/10.1093/molbev/msz185 Rosen BD, Bickhart DM, Schnabel RD, Koren S, Elsik CG, Tseng E, Rowan TN, Low WY, Zimin A, Couldrey C, Hall R, Li W, Rhie A, Ghurye J, McKay SD, Thibaud-Nissen F, Hoffman J, Murdoch BM, Snelling WM, McDaneld TG, Hammond JA, Schwartz JC, Nandolo W, Hagen DE, Dreischer C, Schultheiss SJ, Schroeder SG, Phillippy AM, Cole JB, Van Tassell CP, Liu G, Smith TPL, Medrano JF (2020) De novo assembly of the cattle reference genome with single-molecule sequencing. GigaScience 9:giaa021 Sayers EW, Beck J, Bolton EE, Bourexis D, Brister JR, Canese K, Comeau DC, Funk K, Kim S, Klimke W, Marchler-Bauer A, Landrum M, Lathrop S, Lu Z, Madden TL, O’Leary N, Phan L, Rangwala SH, Schneider VA, Skripchenko Y, Wang J, Ye J, Trawick BW, Pruitt KD, Sherry ST (2021) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 49:D10–D17 Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, Farrell CM, Feldgarden M, Fine AM, Funk K, Hatcher E, Kannan S, Kelly C, Kim S, Klimke W, Landrum MJ, Lathrop S, Lu Z, Madden TL, Malheiro A, Marchler-Bauer A, Murphy TD, Phan L, Pujar S, Rangwala SH, Schneider VA, Tse T, Wang J, Ye J, Trawick BW, Pruitt KD, Sherry ST (2023a) Database resources of the National Center for Biotechnology Information in 2023. Nucleic Acids Res 51:D29–D38 Sayers EW, Cavanaugh M, Clark K, Pruitt KD, Sherry ST, Yankie L, Karsch-Mizrachi I (2023b) GenBank 2023 update. Nucleic Acids Res 51:D141–D144 Shamimuzzaman M, Le Tourneau JJ, Unni DR, Diesh CM, Triant DA, Walsh AT, Tayal A, Conant GC, Hagen DE, Elsik CG (2020) Bovine genome database: new annotation tools for a new reference genome. Nucleic Acids Res 48:D676–D681 Smit AFA, Hubley R, Green P (2013–2015) RepeatMasker Open-4.0. http://www.repeatmasker.org/. Accessed 28 Apr 2023 Tait-Burkard C, Doeschl-Wilson A, McGrew MJ, Archibald AL, Sang HM, Houston RD, Whitelaw CB, Watson M (2018) Livestock 2.0—genome editing for fitter, healthier, and more productive farmed animals. Genome Biol 19:204 Triant DA, Le Tourneau JJ, Diesh CM, Unni DR, Shamimuzzaman M, Walsh AT, Gardiner J, Goldkamp AK, Li Y, Nguyen HN, Roberts C, Zhao Z, Alexander LJ, Decker JE, Schnabel RD, Schroeder SG, Sonstegard TS, Taylor JF, Rivera RM, Hagen DE, Elsik CG (2020) Using online tools at the Bovine Genome Database to manually annotate genes in the new reference genome. Anim Genet 51:675–682 Tweedie S, Braschi B, Gray K, Jones TEM, Seal RL, Yates B, Bruford EA (2021) Genenames.org: the HGNC and VGNC resources in 2021. Nucleic Acids Res 49:D939–D946 UniProt Consortium (2023) UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res 51:D523–D531 Warr A, Affara N, Aken B, Beiki H, Bickhart DM, Billis K, Chow W, Eory L, Finlayson HA, Flicek P, Girón CG, Griffin DK, Hall R, Hannum G, Hourlier T, Howe K, Hume DA, Izuogu O, Kim K, Koren S, Liu H, Manchanda N, Martin FJ, Nonneman DJ, O’Connor RE, Phillippy AM, Rohrer GA, Rosen BD, Rund LA, Sargent CA, Schook LB, Schroeder SG, Schwartz AS, Skinner BM, Talbot R, Tseng E, Tuggle CK, Watson M, Smith TPL, Archibald AL (2020) An improved pig reference genome sequence to enable pig genetics and genomics research. GigaScience. https://doi.org/10.1093/gigascience/giaa051 Warren WC, Hillier LW, Tomlinson C, Minx P, Kremitzki M, Graves T, Markovic C, Bouk N, Pruitt KD, Thibaud-Nissen F, Schneider V, Mansour TA, Brown CT, Zimin A, Hawken R, Abrahamsen M, Pyrkosz AB, Morisson M, Fillon V, Vignal A, Chow W, Howe K, Fulton JE, Miller MM, Lovell P, Mello CV, Wirthlin M, Mason AS, Kuo R, Burt DW, Dodgson JB, Cheng HH (2017) A new chicken genome assembly provides insight into avian genome structure. G3 (bethesda) 7:109–117 Young R, Lefevre L, Bush SJ, Joshi A, Singh SH, Jadhav SK, Dhanikachalam V, Lisowski ZM, Iamartino D, Summers KM, Williams JL, Archibald AL, Gokhale S, Kumar S, Hume DA (2019) A gene expression atlas of the domestic water buffalo (Bubalus bubalis). Front Genet 10:668