CastorDB: a comprehensive knowledge base for Ricinus communis

Springer Science and Business Media LLC - Tập 4 - Trang 1-9 - 2011
Shalabh Thakur1, Sanjay Jha2, Bharat B Chattoo1
1Centre for Genome Research, Department of Microbiology and Biotechnology Centre, Faculty of Science, The M. S. University of Baroda, Vadodara, India
2Department of Biotechnology, ASPEE College of Horticulture and Forestry, Navsari Agricultural University, Navsari, India

Tóm tắt

Ricinus communis is an industrially important non-edible oil seed crop, native to tropical and subtropical regions of the world. Although, R. communis genome was assembled in 4X draft by JCVI, and is predicted to contain 31,221 proteins, the function of most of the genes remains to be elucidated. A large amount of information of different aspects of the biology of R. communis is available, but most of the data are scattered one not easily accessible. Therefore a comprehensive resource on Castor, Castor DB, is required to facilitate research on this important plant. CastorDB is a specialized and comprehensive database for the oil seed plant R. communis, integrating information from several diverse resources. CastorDB contains information on gene and protein sequences, gene expression and gene ontology annotation of protein sequences obtained from a variety of repositories, as primary data. In addition, computational analysis was used to predict cellular localization, domains, pathways, protein-protein interactions, sumoylation sites and biochemical properties and has been included as derived data. This database has an intuitive user interface that prompts the user to explore various possible information resources available on a given gene or a protein. CastorDB provides a user friendly comprehensive resource on castor with particular emphasis on its genome, transcriptome, and proteome and on protein domains, pathways, protein localization, presence of sumoylation sites, expression data and protein interacting partners.

Tài liệu tham khảo

Chan AP, Crabtree J, et al.: Draft genome sequence of the oilseed species Ricinus communis. Nat Biotech. 2010, 28 (9): 951-956. 10.1038/nbt.1674. National Center for Biotechnology Information. [http://www.ncbi.nlm.nih.gov] JCVI Castor Bean Genome Database. [http://castorbean.jcvi.org/index.shtml] MySQL database server. [http://www.mysql.com/] Boguski MS, Lowe TM, Tolstoshev CM: dbEST--database for expressed sequence tags. Nature Genetics. 1993, 4 (4): 332-3. 10.1038/ng0893-332. Altschul , Stephen F, Warren G, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 10-215: 403- Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S, AmiGO Hub, Web Presence Working Group: AmiGO: online access to ontology and annotation data. Bioinformatics. 2009, 25 (2): 288-9. 10.1093/bioinformatics/btn615. Ashburner Michael, et al.: Gene ontology: tool for the unification of biology. Nature Genetics. 2000, 25: 25-29. 10.1038/75556. Hunter, et al.: InterPro: the integrative protein signature database. Nucleic Acids Res. 2009, 37Â: D211-D215. Horton P, Park K-J, Obayashi T, Fujita N, Harada H, Adams-Collier CJ, Nakai K: WoLF PSORT: Protein Localization Predictor. Nucleic Acid Res. 2007, W585-W587. 35 Web Server Nielsen H, Engelbrecht J, Brunak S, Heijne G: Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Prot Engg. 1997, 10 (1): 1-6. 10.1093/protein/10.1.1. Jannick D, Gunnar von H, Søren B: Improved prediction of signal peptides: SignalP 3.0. J Mol Biol. 2004, 340: 783-795. 10.1016/j.jmb.2004.05.028. Krogh A, Larsson B, von Heijne G, Sonnhammer ELL: Predicting transmembrane protein topology with a hidden Markov model: Application to complete genome. Journal of Molecular Biology. 2001, 305 (3): 567-580. 10.1006/jmbi.2000.4315. Sonnhammer ELL, von Heijne G, Krogh A: A hidden Markov model for predicting transmembrane helices in protein sequences. Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology. Edited by: Glasgow J, Littlejohn T, Major F, Lathrop R, Sankoff D, Sensen C. 1998, Menlo Park, CA, AAAI Press, 175-182. Nakai K, Horton P: PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trend Biochem Sci. 1999, 24 (1): 34-35. 10.1016/S0968-0004(98)01336-X. Finn RD, Mistry J, Schuster-Böckler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R: Pfam: clans, web tools and services. Nucl Acid Res. 2006, D247-D251. 34 Database Servant F, Bru C, Carrere S, Courcelle E, Gouzy J, Peyruc D, Kahn D: ProDom: Automated clustering of homologous domains. Brief Bioinform. 2002, 3 (3): 246-251. 10.1093/bib/3.3.246. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucl Acid Res. 1999, 27 (1): 29-34. 10.1093/nar/27.1.29. Bairoch A, Apweiler R: The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucl Acid Res. 1997, 25 (1): 31-36. 10.1093/nar/25.1.31. Bairoch A, Apweiler R: The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucl Acid Res. 1998, 26 (1): 38-42. 10.1093/nar/26.1.38. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389. DNA Data Bank of Japan. [http://www.ddbj.nig.ac.jp/] Cui Jian, Li Peng, Li Guang, Xu Feng, Zhao Chen, Li Yuhua, Yang Zhongnan, Wang Guang, Yu Qingbo, Li Yixue, Shi Tieliu: AtPID: Arabidopsis thaliana protein interactome database an integrative platform for plant systems biology. Nucleic Acids Research. 2008, 36: D999-D1008. Swarbreck David, Wilks Christopher, Lamesch Philippe, et al.: The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Research. 2008, 36: D1009-D1014. Xue Y, Zhou F, Fu C, Xu Y, Yao X: SUMOsp: a web server for sumoylation site prediction. Nucl Acid Res. 2006, W254-W257. 34 Web Server Rice P, Longden I, Bleasby A: EMBOSS: The European Molecular Biology Open Software Suite. Trend Genet. 2000, 16 (6): 276-277. 10.1016/S0168-9525(00)02024-2. JAVA. [http://www.sun.com/java/] Perl CGI Scripts. [http://www.activestate.com/Products/activeperl/index.mhtml] Apache web server. [http://httpd.apache.org/] Sigrist CJA, Cerutti L, de Castro E: PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 2010, 161-6. 38 Database Cytoscape. [http://www.cytoscape.org]