Analysis of BAC end sequences in oak, a keystone forest tree species, providing insight into the composition of its genome
Tóm tắt
One of the key goals of oak genomics research is to identify genes of adaptive significance. This information may help to improve the conservation of adaptive genetic variation and the management of forests to increase their health and productivity. Deep-coverage large-insert genomic libraries are a crucial tool for attaining this objective. We report herein the construction of a BAC library for Quercus robur, its characterization and an analysis of BAC end sequences. The Eco RI library generated consisted of 92,160 clones, 7% of which had no insert. Levels of chloroplast and mitochondrial contamination were below 3% and 1%, respectively. Mean clone insert size was estimated at 135 kb. The library represents 12 haploid genome equivalents and, the likelihood of finding a particular oak sequence of interest is greater than 99%. Genome coverage was confirmed by PCR screening of the library with 60 unique genetic loci sampled from the genetic linkage map. In total, about 20,000 high-quality BAC end sequences (BESs) were generated by sequencing 15,000 clones. Roughly 5.88% of the combined BAC end sequence length corresponded to known retroelements while ab initio repeat detection methods identified 41 additional repeats. Collectively, characterized and novel repeats account for roughly 8.94% of the genome. Further analysis of the BESs revealed 1,823 putative genes suggesting at least 29,340 genes in the oak genome. BESs were aligned with the genome sequences of Arabidopsis thaliana, Vitis vinifera and Populus trichocarpa. One putative collinear microsyntenic region encoding an alcohol acyl transferase protein was observed between oak and chromosome 2 of V. vinifera. This BAC library provides a new resource for genomic studies, including SSR marker development, physical mapping, comparative genomics and genome sequencing. BES analysis provided insight into the structure of the oak genome. These sequences will be used in the assembly of a future genome sequence for oak.
Tài liệu tham khảo
Jones JH: Evolution of the Fagaceae: the implications of foliar features. Annals of the Missouri Botanical Garden. 1986, 73: 228-275. 10.2307/2399112.
Kremer A: Fagaceae Trees. Genome Mapping and Molecular Breeding in Plants. 2007, Kole, Chittaranjan. Kole, Chittaranjan, 7: 161-184. 10.1007/978-3-540-34541-1_5.
Bennett MD, Leitch IJ, Price HJ, Johnston JS: Comparisons with Caenorhabditis (~100 Mb) and Drosophila (~175 Mb) using flow cytometry show genome size in Arabidopsis to be ~157 Mb and thus ~25% larger than the Arabidopsis Genome Initiative estimate of ~125 Mb. Annals of Botany. 2003, 91: 547-557. 10.1093/aob/mcg057.
Tuskan GA, Difazio S, Jansson S, et al: The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006, 313: 1596-1604. 10.1126/science.1128691.
Ueno S, Le Provost G, Léger V, Klopp C, Noirot C, Frigerio J, Salin F, Salse J, Abrouk M, Murat F, Brendel O, Derory J, Abadie P, Léger P, Cabane C, Barré A, de Daruvar A, Couloux A, Wincker P, Reviron M, Kremer A, Plomion C: Bioinformatic analysis of Sanger and 454 ESTs for a keystone forest tree species: oak. BMC Genomics. 2010, 11: 650-674. 10.1186/1471-2164-11-650.
Casasoli M, Derory J, Morera-Dutrey C, Brendel O, Porth I, Guehl J, Villani F, Kremer A: Comparison of quantitative trait loci for adaptive traits between oak and chestnut based on an expressed sequence tag consensus map. Genetics. 2006, 172: 533-546.
Derory J, Scotti-Saintagne C, Bertocchi E, Le Dantec L, Graignic N, Jauffres A, Casasoli M, Chancerel E, Bodenes C, Alberto F, Kremer A: Contrasting relations between diversity of candidate genes and variation of bud burst in natural and segregating populations of European oaks. Heredity. 2010, 105 (4): 401-11. 10.1038/hdy.2009.170.
Alberto F, Niort J, Derory J, Lepais O, Vitalis R, Galop D, Kremer A: Population differentiation of sessile oak at the altitudinal front of migration in the French Pyrenees. Mol Ecol. 2010, 19: 2626-2639. 10.1111/j.1365-294X.2010.04631.x.
Durand J, Bodénès C, Chancerel E, Frigerio J, Vendramin G, Sebastiani F, Buanamici A, Gailing O, Koelewijn H, Villani F, Mattioni C, Cherubini M, Goicoechea PG, Herran A, Ikaran Z, Cabané C, Ueno S, Alberto F, Dumoulin P, Guichoux E, de Daruvar A, Kremer A, Plomion C: A fast and cost-effective approach to develop and map EST-SSR markers: oak as a case study. BMC Genomics. 2010, 11: 570-10.1186/1471-2164-11-570.
Zhang HB, Wu CC: BACs as tools for genome sequencing. Plant Physiology and Biochemistry. 2001, 39: 195-209. 10.1016/S0981-9428(00)01236-5.
Meksem K, Kahl G: The handbook of plant genome mapping: genetic and physical mapping. 2005, Wiley-VCH
Schatz MC, Delcher AL, Salzberg SL: Assembly of large genomes using second-generation sequencing. Genome Research. 2010, 20: 1165-1173. 10.1101/gr.101360.109.
Velasco R, Zharkikh A, Affourtit J, Dhingra A, Cestaro A, et al: The genome of the domesticated apple (Malus × domestica Borkh.). Nat Genet. 2010, 42: 833-839. 10.1038/ng.654.
Clarke L, Carbon J: A colony bank containing synthetic Col El hybrid plasmids representative of the entire E. coli genome. Cell. 1976, 9: 91-99. 10.1016/0092-8674(76)90055-6.
Adam-Blondon A, Bernole A, Faes G, Lamoureux D, Pateyron S, Grando MS, Caboche M, Velasco R, Chalhoub B: Construction and characterization of BAC libraries from major grapevine cultivars. Theor Appl Genet. 2005, 110: 1363-1371. 10.1007/s00122-005-1924-9.
CNRGV: The French Plant Genomic Resource Center - Home. [http://cnrgv.toulouse.inra.fr/]
PICME: The Platform for Integrated Clone Management. [http://www.picme.at/]
Zoldos V, Papes D, Brown S, Panaud O, Siljak-Yakovlev S: Genome size and base composition of seven Quercus species: inter- and intra-population variation. Genome. 1998, 41: 162-168.
Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408: 796-815. 10.1038/35048692.
Liang H, Fang EG, Tomkins JP, Luo M, Kudrna D, Kim HR, Arumuganathan K, Zhao S, Leebens-Mack J, Schlarbaum SE, Banks JA, dePamphilis CW, Mandoli DF, Wing RA, Carlson JE: Development of a BAC library for yellow-poplar (Liriodendron tulipifera) and the identification of genes associated with flower development and lignin biosynthesis. Tree Genetics & Genomes. 2006, 3: 215-225.
Jaillon O, Aury J, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, Hugueney P, Dasilva C, Horner D, Mica E, Jublot D, Poulain J, Bruyère C, Billault A, Segurens B, Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V, Vico V, Del Fabbro C, Alaux M, Di Gaspero G, Dumas V, Felice N, Paillard S, Juman I, Moroldo M, Scalabrin S, Canaguier A, Le Clainche I, Malacrida G, Durand E, Pesole G, Laucou V, Chatelet P, Merdinoglu D, Delledonne M, Pezzotti M, Lecharny A, Scarpelli C, Artiguenave F, Pè ME, Valle G, Morgante M, Caboche M, Adam-Blondon A, Weissenbach J, Quétier F, Wincker P: The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007, 449: 463-467. 10.1038/nature06148.
Datema E, Mueller LA, Buels R, Giovannoni JJ, Visser RGF, Stiekema WJ, van Ham RCGJ: Comparative BAC end sequence analysis of tomato and potato reveals overrepresentation of specific gene families in potato. BMC Plant Biol. 2008, 8: 34-10.1186/1471-2229-8-34.
Han Y, Chagné D, Gasic K, Rikkerink EHA, Beever JE, Gardiner SE, Korban SS: BAC-end sequence-based SNPs and Bin mapping for rapid integration of physical and genetic maps in apple. Genomics. 2009, 93: 282-288. 10.1016/j.ygeno.2008.11.005.
Moisy C, Garrison KE, Meredith CP, Pelsy F: Characterization of ten novel Ty1/copia-like retrotransposon families of the grapevine genome. BMC Genomics. 2008, 9: 469-10.1186/1471-2164-9-469.
Cavagnaro PF, Chung S, Szklarczyk M, Grzebelus D, Senalik D, Atkins AE, Simon PW: Characterization of a deep-coverage carrot (Daucus carota L.) BAC library and initial analysis of BAC-end sequences. Mol Genet Genomics. 2009, 281: 273-288. 10.1007/s00438-008-0411-9.
Hribová E, Neumann P, Matsumoto T, Roux N, Macas J, Dolezel J: Repetitive part of the banana (Musa acuminata) genome investigated by low-depth 454 sequencing. BMC Plant Biol. 2010, 10: 204-10.1186/1471-2229-10-204.
Terol J, Naranjo MA, Ollitrault P, Talon M: Development of genomic resources for Citrus clementina: characterization of three deep-coverage BAC libraries and analysis of 46,000 BAC end sequences. BMC Genomics. 2008, 9: 423-10.1186/1471-2164-9-423.
Hong CP, Plaha P, Koo D, Yang T, Choi SR, Lee YK, Uhm T, Bang J, Edwards D, Bancroft I, Park B, Lee J, Lim YP: A Survey of the Brassica rapa genome by BAC-end sequence analysis and comparison with Arabidopsis thaliana. Mol Cells. 2006, 22: 300-307.
Vitis vinifera GSVIVG01022745001 gene - URGI Versailles. [http://urgi.versailles.inra.fr/cgi-bin/gbrowse/vitis_12x_pub/?name=GSVIVG01022745001]
Okada T, Hirai MY, Suzuki H, Yamazaki M, Saito K: Molecular characterization of a novel quinolizidine alkaloid O-tigloyltransferase: cDNA cloning, catalytic activity of recombinant protein and expression analysis in Lupinus plants. Plant and Cell Physiology. 2005, 46: 233-244. 10.1093/pcp/pci021.
Barreneche T, Casasoli M, Russell K, Akkak A, Meddour H, Plomion C, Villani F, Kremer A: Comparative mapping between Quercus and Castanea using simple-sequence repeats (SSRs). Theor Appl Genet. 2004, 108: 558-566. 10.1007/s00122-003-1462-2.
Parelle J, Zapater M, Scotti-Saintagne C, Kremer A, Jolivet Y, Dreyer E, Brendel O: Quantitative trait loci of tolerance to waterlogging in a European oak (Quercus robur L.): physiological relevance and temporal effect patterns. Plant Cell Environ. 2007, 30: 422-434. 10.1111/j.1365-3040.2006.01629.x.
Sambrook J, Gething MJ: Protein structure. Chaperones, paperones. Nature. 1989, 342: 224-225. 10.1038/342224a0.
Weising K, Gardner RC: A set of conserved PCR primers for the analysis of simple sequence repeat polymorphisms in chloroplast genomes of dicotyledonous angiosperms. Genome. 1999, 42: 9-19. 10.1139/g98-104.
Deguilloux M, Pemonge M, Petit RJ: Novel perspectives in wood certification and forensics: dry wood as a source of DNA. Proc Biol Sci. 2002, 269: 1039-1046. 10.1098/rspb.2002.1982.
Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8: 175-185.
Falgueras J, Lara AJ, Fernández-Pozo N, Cantón FR, Pérez-Trabado G, Claros MG: SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read. BMC Bioinformatics. 2010, 11: 38-10.1186/1471-2105-11-38.
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J: Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005, 110: 462-467. 10.1159/000084979.
Tarailo-Graovac M, Chen N: Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2009, Chapter 4 (Unit 4.10):
WU-BLAST: Advanced Biocomputing. [http://www.advbiocomp.com/blast.html]
Zhang Z, Schwartz S, Wagner L, Miller W: A greedy algorithm for aligning DNA sequences. J Comput Biol. 2000, 7: 203-214. 10.1089/10665270050081478.
Bailey TL, Bodén M, Whitington T, Machanick P: The value of position-specific priors in motif discovery using MEME. BMC Bioinformatics. 2010, 11: 179-10.1186/1471-2105-11-179.
Pruitt KD, Tatusova T, Maglott DR: NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007, 35: D61-D65. 10.1093/nar/gkl842.
Ouyang S, Buell CR: The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants. Nucleic Acids Res. 2004, 32: D360-363. 10.1093/nar/gkh099.
ITMI Triticeae Repeat Sequence Database. [http://wheat.pw.usda.gov/ITMI/Repeats/]
Bairoch A, Boeckmann B, Ferro S, Gasteiger E: Swiss-Prot: juggling between evolution and stability. Brief Bioinformatics. 2004, 5: 39-55. 10.1093/bib/5.1.39.
Kolpakov R, Bana G, Kucherov G: mreps: Efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res. 2003, 31: 3672-3678. 10.1093/nar/gkg617.
BLAST: Basic Local Alignment Search Tool. [http://blast.ncbi.nlm.nih.gov/]
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJA, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C: InterPro: the integrative protein signature database. Nucleic Acids Res. 2009, 37: D211-215. 10.1093/nar/gkn785.
Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer ELL, Eddy SR, Bateman A: The Pfam protein families database. Nucleic Acids Res. 2010, 38: D211-222. 10.1093/nar/gkp985.
Poole RL: The TAIR database. Methods Mol Biol. 2007, 406: 179-212.
Populus trichocarpa Genome Browser: PTR15:228635..254155. [http://urgi.versailles.inra.fr/cgi-bin/gbrowse/populus_PTR_pub/]
Grape Genome Browser. [http://www.cns.fr/externe/GenomeBrowser/Vitis/]