De novo assembly and characterization of Camelina sativatranscriptome by paired-end sequencing

Springer Science and Business Media LLC - Tập 14 - Trang 1-11 - 2013
Chao Liang1, Xuan Liu2, Siu-Ming Yiu2, Boon Leong Lim1
1School of Biological Sciences, The University of Hong Kong, Pokfulam, Hong Kong, China
2Department of Computer Science, the University of Hong Kong, Pokfulam, Hong Kong, China

Tóm tắt

Biofuels extracted from the seeds of Camelina sativa have recently been used successfully as environmentally friendly jet-fuel to reduce greenhouse gas emissions. Camelina sativa is genetically very close to Arabidopsis thaliana, and both are members of the Brassicaceae. Although public databases are currently available for some members of the Brassicaceae, such as A. thaliana, A. lyrata, Brassica napus, B. juncea and B. rapa, there are no public Expressed Sequence Tags (EST) or genomic data for Camelina sativa. In this study, a high-throughput, large-scale RNA sequencing (RNA-seq) of the Camelina sativa transcriptome was carried out to generate a database that will be useful for further functional analyses. Approximately 27 million clean “reads” filtered from raw reads by removal of adaptors, ambiguous reads and low-quality reads (2.42 gigabase pairs) were generated by Illumina paired-end RNA-seq technology. All of these clean reads were assembled de novo into 83,493 unigenes and 103,196 transcripts using SOAPdenovo and Trinity, respectively. The average length of the transcripts generated by Trinity was 697 bp (N50 = 976), which was longer than the average length of unigenes (319 bp, N50 = 346 bp). Nonetheless, the assembly generated by SOAPdenovo produced similar number of non-redundant hits (22,435) with that of Trinity (22,433) in BLASTN searches of the Arabidopsis thaliana CDS sequence database (TAIR). Four public databases, the Kyoto Encyclopedia of Genes and Genomes (KEGG), Swiss-prot, NCBI non-redundant protein (NR), and the Cluster of Orthologous Groups (COG), were used for unigene annotation; 67,791 of 83,493 unigenes (81.2%) were finally annotated with gene descriptions or conserved protein domains that were mapped to 25,329 non-redundant protein sequences. We mapped 27,042 of 83,493 unigenes (32.4%) to 119 KEGG metabolic pathways. This is the first report of a transcriptome database for Camelina sativa, an environmentally important member of the Brassicaceae. We showed that C. savita is closely related to Arabidopsis spp. and more distantly related to Brassica spp. Although the majority of annotated genes had high sequence identity to those of A. thaliana, a substantial proportion of disease-resistance genes (NBS-encoding LRR genes) were instead more closely similar to the genes of other Brassicaceae; these genes included BrCN, BrCNL, BrNL, BrTN, BrTNL in B. rapa. As plant genomes are under long-term selection pressure from environmental stressors, conservation of these disease-resistance genes in C. sativa and B. rapa genomes implies that they are exposed to the threats from closely-related pathogens in their natural habitats.

Tài liệu tham khảo

Zubr J: Oil-seed crop: Camelina sativa. Ind Crop Prod. 1997, 6: 113-119. 10.1016/S0926-6690(96)00203-8. Agegnehu M, Honermeier B: Effects of seeding rates and nitrogen fertilization on seed yield, seed quality and yield components of false flax (Camelina sativa Crtz). Bodenkultur. 1997, 48: 15-21. Lu C, Kang J: Generation of transgenic plants of a potential oilseed crop Camelina sativa by Agrobacterium-mediated transformation. Plant Cell Rep. 2008, 27: 273-278. 10.1007/s00299-007-0454-0. Gebauer SK, Psota TL, Harris WS, Kris-Etherton PM: n-3 fatty acid dietary recommendations and food sources to achieve essentiality and cardiovascular benefits. Am J Clin Nutr. 2006, 83: 1526s-1535s. Frohlich A, Rice B: Evaluation of Camelina sativa oil as a feedstock for biodiesel production. Ind Crop Prod. 2005, 21: 25-31. 10.1016/j.indcrop.2003.12.004. Bernardo A, Howard-Hildige R, O’Connell A, Nichol R, Ryan J, Rice B, Roche E, Leahy JJ: Camelina oil as a fuel for diesel transport engines. Ind Crop Prod. 2003, 17: 191-197. 10.1016/S0926-6690(02)00098-5. Moser BR, Vaughn SF: Evaluation of alkyl esters from Camelina sativa oil as biodiesel and as blend components in ultra low-sulfur diesel fuel. Bioresour Technol. 2010, 101: 646-653. 10.1016/j.biortech.2009.08.054. Shonnard DR, Williams L, Kalnes TN: Camelina-Derived Jet Fuel and Diesel: Sustainable Advanced Biofuels. Environmental Progress & Sustainable Energy. 2010, 29: 382-392. 10.1002/ep.10461. Budin JT, Breene WM, Putnam DH: Some compositional properties of Camelina (Camelina-Sativa L Crantz) seeds and oils. J Am Oil Chem Soc. 1995, 72: 309-315. 10.1007/BF02541088. Eden E: Variation in resistance of camelina (Camelina sativa [L.] crtz.) to downy mildew (Peronospora camelinae Gaum.). Journal of Phytopathology-Phytopathologische Zeitschrift. 2001, 149: 129-133. 10.1046/j.1439-0434.2001.00599.x. Parchman TL, Geist KS, Grahnen JA, Benkman CW, Buerkle CA: Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. BMC Genomics. 2010, 11: 180-10.1186/1471-2164-11-180. Sun C, Li Y, Wu Q, Luo HM, Sun YZ, Song JY, Lui EMK, Chen SL: De novo sequencing and analysis of the American ginseng root transcriptome using a GS FLX Titanium platform to discover putative genes involved in ginsenoside biosynthesis. BMC Genomics. 2010, 11: 262-10.1186/1471-2164-11-262. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng QD: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011, 29: 644-652. 10.1038/nbt.1883. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556. Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M: The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004, 32: D277-D280. 10.1093/nar/gkh063. Iseli C, Jongeneel CV, Bucher P: ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. Proc Int Conf Intell Syst Mol Biol. 1999, 7: 138-148. Cherian G, Campbell A, Parker T: Egg quality and lipid composition of eggs from hens fed Camelina sativa. J Appl Poultry Res. 2009, 18: 143-150. 10.3382/japr.2008-00070. Hurtaud C, Peyraud JL: Effects of feeding camelina (seeds or meal) on milk fatty acid composition and butter spreadability. J Dairy Sci. 2007, 90: 5134-5145. 10.3168/jds.2007-0031. Sharma A, Chauhan RS: In silico identification and comparative genomics of candidate genes involved in biosynthesis and accumulation of seed oil in plants. Comp Funct Genomics. 2012, 2012: 914843- Kachroo A, Shanklin J, Whittle E, Lapchyk L, Hildebrand D, Kachroo P: The Arabidopsis stearoyl-acyl carrier protein-desaturase family and the contribution of leaf isoforms to oleic acid synthesis. Plant Mol Biol. 2007, 63: 257-271. Zhang GJ, Guo GW, Hu XD, Zhang Y, Li QY, Li RQ, Zhuang RH, Lu ZK, He ZQ, Fang XD: Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. Genome Res. 2010, 20: 646-654. 10.1101/gr.100677.109. Vega-Arreguin JC, Ibarra-Laclette E, Jimenez-Moraila B, Martinez O, Vielle-Calzada JP, Herrera-Estrella L, Herrera-Estrella A: Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing. BMC Genomics. 2009, 10: 299-10.1186/1471-2164-10-299. Wei W, Qi X, Wang L, Zhang Y, Hua W, Li D, Lv H, Zhang X: Characterization of the sesame (Sesamum indicum L.) global transcriptome using Illumina paired-end sequencing and development of EST-SSR markers. BMC Genomics. 2011, 12: 451-10.1186/1471-2164-12-451. Zhang YJ, Ma PF, Li DZ: High-throughput sequencing of six bamboo chloroplast genomes: phylogenetic implications for temperate woody bamboos (Poaceae: Bambusoideae). PLoS One. 2011, 6: e20596-10.1371/journal.pone.0020596. Qiu Q, Ma T, Hu QJ, Liu BB, Wu YX, Zhou HH, Wang Q, Wang J, Liu JQ: Genome-scale transcriptome analysis of the desert poplar. Populus euphratica. Tree Physiology. 2011, 31: 452-461. 10.1093/treephys/tpr015. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008, 320: 1344-1349. 10.1126/science.1158441. Maher CA, Palanisamy N, Brenner JC, Cao XH, Kalyana-Sundaram S, Luo SJ, Khrebtukova I, Barrette TR, Grasso C, Yu JD: Chimeric transcript discovery by paired-end transcriptome sequencing. Proc Natl Acad Sci USA. 2009, 106: 12353-12358. 10.1073/pnas.0904720106. Shi CY, Yang H, Wei CL, Yu O, Zhang ZZ, Jiang CJ, Sun J, Li YY, Chen Q, Xia T: Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds. BMC Genomics. 2011, 12: 131-10.1186/1471-2164-12-131. Hutcheon C, Ditt RF, Beilstein M, Comai L, Schroeder J, Goldstein E, Shewmaker CK, Nguyen T, De Rocher J, Kiser J: Polyploid genome of Camelina sativa revealed by isolation of fatty acid synthesis genes. BMC Plant Biol. 2010, 10: 233-10.1186/1471-2229-10-233. Li CC, Gui SH, Yang T, Walk T, Wang XR, Liao H: Identification of soybean purple acid phosphatase genes and their expression responses to phosphorus availability and symbiosis. Ann Bot. 2012, 109: 275-285. 10.1093/aob/mcr246. Sun F, Suen PK, Zhang Y, Liang C, Carrie C, Whelan J, Ward JL, Hawkins ND, Jiang L, Lim BL: A dual-targeted purple acid phosphatase in Arabidopsis thaliana moderates carbon metabolism and its overexpression leads to faster plant growth and higher seed yield. New Phytol. 2012, 194: 206-219. 10.1111/j.1469-8137.2011.04026.x. Zhang Y, Yu L, Yung KF, Leung DYC, Sun F, Lim BL: Over-expression of AtPAP2 in Camelina sativa leads to faster plant growth and higher seed yield. Biotechnology for Biofuels. 2012, 5: 19-10.1186/1754-6834-5-19. Zhou T, Wang Y, Chen JQ, Araki H, Jing Z, Jiang K, Shen J, Tian D: Genome-wide identification of NBS genes in japonica rice reveals significant expansion of divergent non-TIR NBS-LRR genes. Mol Genet Genomics. 2004, 271: 402-415. 10.1007/s00438-004-0990-z. Ameline-Torregrosa C, Wang BB, O’Bleness MS, Deshpande S, Zhu HY, Roe B, Young ND, Cannon SB: Identification and characterization of nucleotide-binding site-Leucine-rich repeat genes in the model plant Medicago truncatul. Plant Physiol. 2008, 146: 5-21. Mun JH, Yu HJ, Park S, Park BS: Genome-wide identification of NBS-encoding resistance genes in Brassica rapa. Mol Genet Genomics. 2009, 282: 617-631. 10.1007/s00438-009-0492-0. Kohler A, Rinaldi C, Duplessis S, Baucher M, Geelen D, Duchaussoy F, Meyers BC, Boerjan W, Martin F: Genome-wide identification of NBS resistance genes in Populus trichocarpa. Plant Mol Biol. 2008, 66: 619-636. 10.1007/s11103-008-9293-9. Halkier BA, Gershenzon J: Biology and biochemistry of glucosinolates. Annu Rev Plant Biol. 2006, 57: 303-333. 10.1146/annurev.arplant.57.032905.105228. Grubb CD, Abel S: Glucosinolate metabolism and its control. Trends Plant Sci. 2006, 11: 89-100. Zang YX, Kim HU, Kim JA, Lim MH, Jin M, Lee SC, Kwon SJ, Lee SI, Hong JK, Park TH: Genome-wide identification of glucosinolate synthesis genes in Brassica rapa. FEBS J. 2009, 276: 3559-3574. 10.1111/j.1742-4658.2009.07076.x. Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B: TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics. 2003, 19: 651-652. 10.1093/bioinformatics/btg034. Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21: 3674-3676. 10.1093/bioinformatics/bti610. Ye J, Fang L, Zheng H, Zhang Y, Chen J, Zhang Z, Wang J, Li S, Li R, Bolund L: WEGO: a web tool for plotting GO annotations. Nucleic Acids Res. 2006, 34: W293-W297. 10.1093/nar/gkl031.