Consolidating drug data on a global scale using Linked Data

Journal of Biomedical Semantics - Tập 8 - Trang 1-24 - 2017
Milos Jovanovik1, Dimitar Trajanov1
1Faculty of Computer Science and Engineering, ’Ss. Cyril and Methodius’ University in Skopje, Skopje, Macedonia

Tóm tắt

Drug product data is available on the Web in a distributed fashion. The reasons lie within the regulatory domains, which exist on a national level. As a consequence, the drug data available on the Web are independently curated by national institutions from each country, leaving the data in varying languages, with a varying structure, granularity level and format, on different locations on the Web. Therefore, one of the main challenges in the realm of drug data is the consolidation and integration of large amounts of heterogeneous data into a comprehensive dataspace, for the purpose of developing data-driven applications. In recent years, the adoption of the Linked Data principles has enabled data publishers to provide structured data on the Web and contextually interlink them with other public datasets, effectively de-siloing them. Defining methodological guidelines and specialized tools for generating Linked Data in the drug domain, applicable on a global scale, is a crucial step to achieving the necessary levels of data consolidation and alignment needed for the development of a global dataset of drug product data. This dataset would then enable a myriad of new usage scenarios, which can, for instance, provide insight into the global availability of different drug categories in different parts of the world. We developed a methodology and a set of tools which support the process of generating Linked Data in the drug domain. Using them, we generated the LinkedDrugs dataset by seamlessly transforming, consolidating and publishing high-quality, 5-star Linked Drug Data from twenty-three countries, containing over 248,000 drug products, over 99,000,000 RDF triples and over 278,000 links to generic drugs from the LOD Cloud. Using the linked nature of the dataset, we demonstrate its ability to support advanced usage scenarios in the drug domain. The process of generating the LinkedDrugs dataset demonstrates the applicability of the methodological guidelines and the supporting tools in transforming drug product data from various, independent and distributed sources, into a comprehensive Linked Drug Data dataset. The presented user-centric and analytical usage scenarios over the dataset show the advantages of having a de-siloed, consolidated and comprehensive dataspace of drug data available via the existing infrastructure of the Web.

Tài liệu tham khảo

Spanish Agency for Medicines and Health Products. http://www.aemps.gob.es/medicamentosUsoHumano/portada/home.htm. Accessed 10 Oct 2016. Dutch Medicines Information Bank. http://www.geneesmiddeleninformatiebank.nl/ords/f?p=111. Accessed 10 Oct 2016. Norwegian Medicines Agency. http://www.legemiddelverket.no/Legemiddelsoek/Sider/default.aspx. Accessed 10 Oct 2016. Vidal Directory of Medicines in Russia. http://www.vidal.ru/. Accessed 10 Oct 2016. Ukranian Medicine Index. http://compendium.com.ua/medical_product/. Accessed 10 Oct 2016. South African Medicine Registry. http://www.mpr.gov.za/. Accessed 10 Oct 2016. Macedonian Drug Registry. https://lekovi.zdravstvo.gov.mk/drugsregister/overview. Accessed 10 Oct 2016. Drug Information Portal. http://druginfo.nlm.nih.gov/. Accessed 10 Oct 2016. MedlinePlus. http://www.nlm.nih.gov/medlineplus/druginformation.html. Accessed 10 Oct 2016. DailyMed. http://dailymed.nlm.nih.gov/. Accessed 10 Oct 2016. FDA Data Portal. https://www.accessdata.fda.gov/scripts/cder/daf/. Accessed 10 Oct 2016. Drugs.com. http://www.drugs.com/. Accessed 10 Oct 2016. RxList: The Internet Drug Index. http://www.rxlist.com/. Accessed 10 Oct 2016. DrugBank. http://www.drugbank.ca/. Accessed 10 Oct 2016. PubChem Substance Database. http://www.ncbi.nlm.nih.gov/pcsubstance. Accessed 10 Oct 2016. Hazardous Substances Data Bank. http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?HSDB. Accessed 10 Oct 2016. Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, et al.DrugBank 4.0: Shedding New Light on Drug Metabolism. Nucleic Acids Res. 2014; 42(D1):D1091—D1097. Drugs.com Mobile Apps. http://www.drugs.com/apps/. Accessed 10 Oct 2016. Epocrates Mobile App. http://www.epocrates.com/. Accessed 10 Oct 2016. MedWatcher App. http://medwatcher.org/mobile/. Accessed 10 Oct 2016. Drugs Dictionary App. https://play.google.com/store/apps/details?id=com.drugsdictionary. Accessed 10 Oct 2016. Drug Shortages Apps. http://www.fda.gov/Drugs/DrugSafety/DrugShortages/. Accessed 10 Oct 2016. International Drug Names. http://www.drugs.com/international/. Accessed 10 Oct 2016. Global Drug Database. http://www.wolterskluwercdi.com/drug-data/global-drug-database/. Accessed 10 Oct 2016. The WHO Drug Dictionary Enhanced, by Uppsala Monitoring Centre. http://www.umc-products.com/DynPage.aspx?id=73588&mn1=1107&mn2=1139. Accessed 10 Oct 2016. Bizer C, Heath T, Idehen K, Berners-Lee T. Linked Data on the Web (LDOW2008). In: Proceedings of the 17th International Conference on World Wide Web. Beijing: ACM: 2008. p. 1265–1266. Bizer C, Heath T, Berners-Lee T. Linked Data - The Story So Far. Int J Semant Web Inf Syst. 2009; 5(3):1–22. Heath T, Bizer C. Linked Data: Evolving the Web into a Global Data Space. Synth Lect Semant Web Theory Technol. 2011; 1(1):1–136. Wood D, Zaidman M, Ruth L, Hausenblas M. Linked Data. Greenwich: Manning Publications Co; 2014. Servant FP. Linking Enterprise Data. In: Proceedings of the Linked Data on the Web Workshop. Beijing: CEUR Workshop Proceedings: 2008. Wood D. Linking Enterprise Data. New York: Springer-Verlag New York, Inc.; 2010. Linked Open Data (LOD) Cloud. http://lod-cloud.net/. Accessed 10 Oct 2016. Schmachtenberg M, Bizer C, Paulheim H. Adoption of the Linked Data Best Practices in Different Topical Domains. In: The Semantic Web-ISWC 2014. Riva del Garda. Italy: Springer International Publishing: 2014. p. 245–260. Kozák J, Nečaskỳ M, Pokornỳ J. Drug Encyclopedia–Linked Data Application for Physicians. In: The Semantic Web-ISWC 2015. Bethlehem: Springer International Publishing: 2015. p. 41–56. 5-star Open Data. http://5stardata.info/. Accessed 10 Oct 2016. Linking Open Drug Data (LODD) Project. http://www.w3.org/wiki/HCLSIG/LODD. Accessed 10 Oct 2016. Jentzsch A, Zhao J, Hassanzadeh O, Cheung KH, Samwald M, Andersson B. Linking Open Drug Data. In: Proceedings of the 5th International Conference on Semantic Systems. Graz: Technischen Universität Graz: 2009. Samwald M, Jentzsch A, Bouton C, Kallesøe CS, Willighagen E, Hajagos J, et al.Linked Open Drug Data for Pharmaceutical Research and Development. J Cheminformatics. 2011; 3(1):19. Bio, 2RDF: Linked Data for the Life Sciences. http://bio2rdf.org/. Accessed 10 Oct 2016. Belleau F, Nolin MA, Tourigny N, Rigault P, Morissette J. Bio2RDF: Towards a Mashup to Build Bioinformatics Knowledge Systems. J Biomed Inform. 2008; 41(5):706–716. Callahan A, Cruz-Toledo J, Ansell P, Dumontier M. Bio2RDF Release 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data. In: The Semantic Web: Semantics and Big Data. Montpellier: Springer Berlin Heidelberg: 2013. p. 200–212. Callahan A, Cruz-Toledo J, Dumontier M. Ontology-Based Querying with Bio2RDF’s Linked Open Data. J Biomed Semant. 2013; 4(Suppl 1):S1. Bio, 2RDF Release 3. http://download.bio2rdf.org/release/3/release.html. Accessed 10 Oct 2016. D, 2R Server publishing the DrugBank Database. http://wifo5-03.informatik.uni-mannheim.de/drugbank/. Accessed 10 Oct 2016. Semantic Web Health Care and Life Sciences Interest Group. http://www.w3.org/blog/hcls/. Accessed 10 Oct 2016. Cheung KH, Prud’hommeaux E, Wang Y, Stephens S. Semantic Web for Health Care and Life Sciences: A Review of the State of the Art. Brief Bioinform. 2009; 10(2):111–113. Jovanovik M, Najdenov B, Trajanov D. Linked Open Drug Data from the Health Insurance Fund of Macedonia. In: Proceesings of the 10th International Conference for Informatics and Information Technology. Bitola: Faculty of Computer Science & Engineering: 2013. p. 56–61. Jovanovik M, Najdenov B, Strezoski G, Trajanov D. Linked Open Data for Medical Institutions and Drug Availability Lists in Macedonia. In: New Trends in Database and Information Systems II. Ohrid: Springer International Publishing: 2015. p. 245–256. Jovanovik M, Bogojeska A, Trajanov D, Kocarev L. Inferring Cuisine-Drug Interactions Using the Linked Data Approach. Scientific Reports. 2015; 5:9346. Best Practices for Publishing Linked Data. http://www.w3.org/TR/ld-bp/. Accessed 10 Oct 2016. Hyland B, Wood D. The Joy of Data: A Cookbook for Publishing Linked Government Data on the Web. In: Linking Government Data. New York: Springer New York: 2011. p. 3–26. Hausenblas M. Linked Data Life Cycles. http://www.slideshare.net/mediasemanticweb/linked-data-life-cycles. Accessed 10 Oct 2016. Villazón-Terrazas B, Vilches-Blázquez LM, Corcho O, Gómez-Pérez A. Methodological Guidelines for Publishing Government Linked Data. In: Linking Government Data. New York: Springer New York: 2011. p. 27–49. Auer S, Bühmann L, Dirschl C, Erling O, Hausenblas M, Isele R, et al. Managing the Life-Cycle of Linked Data with the LOD2 Stack. In: The Semantic Web-ISWC 2012. Boston: Springer Berlin Heidelberg: 2012. p. 1–16. The LinkedDrugs Dataset on Datahub. https://datahub.io/dataset/linked-drugs. Accessed 10 Oct 2016. Krueger CW. Software Reuse. ACM Comput Surv (CSUR). 1992; 24(2):131–183. McIlroy MD. Mass Produced Software Components. In: Proceedings of the 1st International Conference on Software Engineering. Garmisch: Scientific Affairs Division, NATO: 1968. p. 79–87. The LinkedDrugs Project on GitHub. https://github.com/etnc/linked-drugs. Accessed 10 Oct 2016. Schema.org Vocabulary. http://schema.org/. Accessed 10 Oct 2016. The LinkedDrugs Project Website. http://drugs.linkeddata.finki.ukim.mk/. Accessed 10 Oct 2016. Virtuoso Instance at the Faculty of Computer Science and Engineering in Skopje. http://linkeddata.finki.ukim.mk. Accessed 10 Oct 2016. SPARQL Endpoint at the Faculty of Computer Science and Engineering in Skopje. http://linkeddata.finki.ukim.mk/sparql. Accessed 10 Oct 2016. ISO 3166-1 alpha-3. https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3. Accessed 10 Oct 2016. BatchRefine. https://github.com/fusepoolP3/p3-batchrefine. Accessed 10 Oct 2016. Permanent URI of the LinkedDrugs Dataset. http://linkeddata.finki.ukim.mk/lod/data/drugs. Accessed 10 Oct 2016. Seminant: SPARQL execution and sharing. http://seminant.com/. Accessed 10 Oct 2016. SPARQL 1.1 Federated Query. https://www.w3.org/TR/sparql11-federated-query/. Accessed 10 Oct 2016. Berners-Lee T, Shadbolt N. There’s Gold to Be Mined from All Our Data. London: The Times; 2011. Kundra V. Digital Fuel of the 21st Century: Innovation Through Open Data and the Network Effect. Cambridge: Joan Shorenstein Center on the Press, Politics and Public Policy; 2012. Berners-Lee T. Linked Data - Design Issues. https://www.w3.org/DesignIssues/LinkedData.html. Accessed 10 Oct 2016. Datahub Portal. http://datahub.io/. Accessed 10 Oct 2016. Linked Open Data (LOD) Cloud cache instance. http://lod.openlinksw.com/. Accessed 10 Oct 2016. DBpedia Ontology. http://dbpedia.org/ontology/. Accessed 10 Oct 2016. Upper Mapping and Binding Exchange Layer (UMBEL). http://umbel.org/umbel. Accessed 10 Oct 2016. Healthcare Metadata - DICOM Ontology. http://lov.okfn.org/dataset/lov/vocabs/dicom. Accessed 10 Oct 2016. Hoehndorf R, Rebholz-Schuhmann D, Haendel M, Stevens R. Thematic Series on Biomedical Ontologies in JBMS: Challenges and New Directions. J Biomed Semant. 2014; 5:15. Antoniou G, Van Harmelen F. A Semantic Web Primer. Cambridge: MIT Press; 2004. Linked Open Vocabularies (LOV). http://lov.okfn.org/. Accessed 10 Oct 2016. DERI Vocabularies. http://vocab.deri.ie/. Accessed 10 Oct 2016. W, 3C: Hash vs. Slash. https://www.w3.org/wiki/HashVsSlash. Accessed 10 Oct 2016. OpenRefine. http://openrefine.org/. Accessed 10 Oct 2016. LODRefine. https://github.com/sparkica/LODRefine/. Accessed 10 Oct 2016. D, 2R Server: Accessing databases with SPARQL and as Linked Data. http://d2rq.org/d2r-server. Accessed 10 Oct 2016. Virtuoso Universal Server. http://virtuoso.openlinksw.com/. Accessed 10 Oct 2016. Silk Framework. http://silkframework.org/. Accessed 10 Oct 2016. Alexander K, Cyganiak R, Hausenblas M, Zhao J. Describing Linked Datasets. In: Proceedings of the Linked Data on the Web Workshop (LDOW2009). Madrid: CEUR Workshop Proceedings: 2009. Linked Open Data (LOD) Cloud: How To Join. http://lod-cloud.net/#how-to-join. Accessed 10 Oct 2016. ATC Codes: Structure and Principles. http://www.whocc.no/atc/structure_and_principles. Accessed 10 Oct 2016. Health and Lifesciences Extension of the Schema.org Vocabulary. http://health-lifesci.schema.org/. Accessed 10 Oct 2016. The ’Drug’ class, from the Schema.org Vocabulary. http://schema.org/Drug. Accessed 10 Oct 2016. Guha RV. Introducing Schema.org: Search Engines Come Together for a Richer Web. 2011. https://googleblog.blogspot.mk/2011/06/introducing-schemaorg-search-engines.html. Accessed 10 Oct 2016. Seth S. Introducing Schema.org: A Collaboration on Structured Data. 2011. http://www.ysearchblog.com/2011/06/02/introducing-schema-org-a-collaboration-on-structured-data/. Accessed 10 Oct 2016. Macbeth S. Introducing Schema.org: Bing, Google and Yahoo Unite to Build the Web of Objects. 2011. https://blogs.bing.com/search/2011/06/02/introducing-schema-org-bing-google-and-yahoo-unite-to-build-the-web-of-objects. Accessed 10 Oct 2016. Guha R, Brickley D, Macbeth S. Schema.org: Evolution of Structured Data on the Web. Commun ACM. 2016; 59(2):44–51. Meusel R, Bizer C, Paulheim H. A Web-Scale Study of the Adoption and Evolution of the Schema.org Vocabulary over Time. In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics. Larnaca: ACM: 2015. Smith W, Chappell A, Corley C. Medical and Transmission Vector Vocabulary Alignment with Schema.org.Aachen: CEUR Workshop Proceedings; 2015. Schema.org Vocabulary Releases. http://schema.org/docs/releases.html. Accessed 10 Oct 2016. Medical and Healthcare Related Terms of the Schema.org Vocabulary. http://schema.org/docs/meddocs.html. Accessed 10 Oct 2016. W, 3C Healthcare Schema Vocabulary Community Group. http://www.w3.org/community/schemed/. Accessed 10 Oct 2016. Twagirumukiza M. Schema.org New Release 3.0 with the health-lifesci.schema.org Extension. 2016. https://www.w3.org/community/schemed/2016/05/11/schema-org-new-release-3-0-with-the-health-lifesci-schema-org-extension/. Accessed 10 Oct 2016. The ATC Classification Ontology. http://bioportal.bioontology.org/ontologies/ATC. Accessed 10 Oct 2016. ISO 4217. https://en.wikipedia.org/wiki/ISO_4217. Accessed 10 Oct 2016.