Data shopping in an open marketplace: Introducing the Ontogrator web application for marking up data using ontologies and browsing using facets

Standards in Genomic Sciences - Tập 4 - Trang 286-292 - 2011

Norman Morrison^1,2, David Hancock^1,2, Lynette Hirschman³, Peter Dawyndt⁴, Bert Verslyppe⁴, Nikos Kyrpides⁵, Renzo Kottmann⁶, Pelin Yilmaz^6,7, Frank Oliver Glöckner⁶, Jeff Grethe⁸, Tim Booth², Peter Sterk⁹, Goran Nenadic¹, Dawn Field^2,9

¹School of Computer Science, University of Manchester, Manchester, UK

²NERC Centre for Ecology and Hydrology, Natural Environment Research Council Environmental Bioinformatics Centre, Wallingford, UK

³The MITRE Corporation, Bedford, USA

⁴Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium

⁵DOE Joint Genome Institute, Walnut Creek, USA

⁶Microbial Genomics Group, Max Planck Institute for Marine Microbiology, Bremen, Germany

⁷Jacobs University Bremen GmbH, Bremen, Germany

⁸University of California, San Diego, La Jolla, USA

⁹Molecular Evolution and Bioinformatics Group, NERC Centre for Ecology and Hydrology, Wallingford, UK

Tóm tắt

In the future, we hope to see an open and thriving data market in which users can find and select data from a wide range of data providers. In such an open access market, data are products that must be packaged accordingly. Increasingly, eCommerce sellers present heterogeneous product lines to buyers using faceted browsing. Using this approach we have developed the Ontogrator platform, which allows for rapid retrieval of data in a way that would be familiar to any online shopper. Using Knowledge Organization Systems (KOS), especially ontologies, Ontogrator uses text mining to mark up data and faceted browsing to help users navigate, query and retrieve data. Ontogrator offers the potential to impact scientific research in two major ways: 1) by significantly improving the retrieval of relevant information; and 2) by significantly reducing the time required to compose standard database queries and assemble information for further research. Here we present a pilot implementation developed in collaboration with the Genomic Standards Consortium (GSC) that includes content from the StrainInfo, GOLD, CAMERA, Silva and Pubmed databases. This implementation demonstrates the power of ontogration and highlights that the usefulness of this approach is fully dependent on both the quality of data and the KOS (ontologies) used. Ideally, the use and further expansion of this collaborative system will help to surface issues associated with the underlying quality of annotation and could lead to a systematic means for accessing integrated data resources.

Tài liệu tham khảo

Field D, Sansone SA, Collis A, Booth T, Dukes P, Gregurick SK, Kennedy KL, Kolar P, Kolker E, Maxon M, et al. ’Omics Data Sharing. Science 2009; 326:234–236. PubMed doi:10.1126/science.1180598 Cochrane GR, Galperin MY. The 2010 Nucleic Acids Research Database Issue and online Database Collection: a community of data resources. Nucleic Acids Res 2010; 38(Database issue):D1–D4. PubMed doi:10.1093/nar/gkp1077 Field D. Working together to put molecules on the map. Nature 2008; 453:978. PubMed doi:10.1038/453978b Sacco GM. presented at the SIGIR’2006 Workshop on Faceted Search, Seattle WA U.S.A 2006 (unpublished). 2006. Taylor CF, Field D, Sansone SA, Aerts J, Apweiler R, Ashburner M, Ball CA, Binz PA, Bogue M, Booth T, et al. Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat Biotechnol 2008; 26:889–896. PubMed doi:10.1038/nbt.1411 Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, Tatusova T, Thomson N, Allen MJ, Angiuoli SV, et al. The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol 2008; 26:541–547. PubMed doi:10.1038/nbt1360 Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000; 25:25–29. PubMed doi:10.1038/75556 Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 2007; 25:1251–1255. PubMed doi:10.1038/nbt1346 Terminizer — Assisted detection of ontological terms, Available at http://www.terminizer.org. uBio Taxonomic Name Server, Available at http://www.ubio.org. Gaz — An Open-Source Gazetteer Built on Ontological Principles. Available at http://gensc.org/gcwiki/index.php/GAZ_Project. Ext JS toolkit. Available at http://www.sencha.com/products/js. The Genomic Standards Consortium. Available at http://gensc.org. The Ontogrator Web application. http://www.ontogrator.org Liolios K, Chen IM, Mavromatis K, Tavernarakis N, Hugenholtz P, Markowitz VM, Kyrpides NC. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 2010; 38(Database issue):D346–D354. PubMed doi:10.1093/nar/gkp848 Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M. CAMERA: a community resource for metagenomics. PLoS Biol 2007; 5:e75. PubMed doi:10.1371/journal.pbio.0050075 Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glockner FO. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 2007; 35:7188–7196. PubMed doi:10.1093/nar/gkm864 Dawyndt P, Vancanneyt M, De Meyer H, Swings J. Knowledge accumulation and resolution of data inconsistencies during the integration of microbial information sources. IEEE Trans Knowl Data Eng 2005; 17:1111–1126. doi:10.1109/TKDE.2005.131 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res 2009; 37(Database issue):D26–D31. PubMed doi:10.1093/nar/gkn723 MIAA — Minimum Information about Anatomy, Available at http://www.molbiol.ox.ac.uk/data/Minimal-Anatomical-Terminology-MAT.obo. The Environment Ontology. Available at http://www.environmentontology.org.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích ảnh hưởng của các bài báo, công bố khoa học Việt Nam và Quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ SciBase

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Hệ thống hội thảo khoa học Việt Nam

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA

Thông tin liên hệ & hỗ trợ

Đơn vị chủ quản, phát triển và vận hành: Công ty Cổ phần Metis

Địa chỉ liên hệ: 26A Lê Đức Thọ, Phường Từ Liêm, Thành phố Hà Nội

Số giấy chứng nhận ĐKKD: 0109293202 cấp ngày 03/08/2020 tại Sở Kế hoạch và Đầu tư thành phố Hà Nội

Người quản lý và chịu trách nhiệm nội dung: Nguyễn Ngọc Sơn

Hotline: 0566.685.688

Email: [email protected]