Discovery of novel biomarkers and phenotypes by semantic technologies

BMC Bioinformatics - Tập 14 - Trang 1-17 - 2013
Carlo A Trugenberger1, Christoph Wälti1, David Peregrim2, Mark E Sharp2, Svetlana Bureeva3
1InfoCodex AG, Semantic Technologies, Buchs (SG), Switzerland
2Merck Research Laboratories, Rahway, USA
3Thomson Reuters, Carlsbad, USA

Tóm tắt

Biomarkers and target-specific phenotypes are important to targeted drug design and individualized medicine, thus constituting an important aspect of modern pharmaceutical research and development. More and more, the discovery of relevant biomarkers is aided by in silico techniques based on applying data mining and computational chemistry on large molecular databases. However, there is an even larger source of valuable information available that can potentially be tapped for such discoveries: repositories constituted by research documents. This paper reports on a pilot experiment to discover potential novel biomarkers and phenotypes for diabetes and obesity by self-organized text mining of about 120,000 PubMed abstracts, public clinical trial summaries, and internal Merck research documents. These documents were directly analyzed by the InfoCodex semantic engine, without prior human manipulations such as parsing. Recall and precision against established, but different benchmarks lie in ranges up to 30% and 50% respectively. Retrieval of known entities missed by other traditional approaches could be demonstrated. Finally, the InfoCodex semantic engine was shown to discover new diabetes and obesity biomarkers and phenotypes. Amongst these were many interesting candidates with a high potential, although noticeable noise (uninteresting or obvious terms) was generated. The reported approach of employing autonomous self-organising semantic engines to aid biomarker discovery, supplemented by appropriate manual curation processes, shows promise and has potential to impact, conservatively, a faster alternative to vocabulary processes dependent on humans having to read and analyze all the texts. More optimistically, it could impact pharmaceutical research, for example to shorten time-to-market of novel drugs, or speed up early recognition of dead ends and adverse reactions.

Tài liệu tham khảo

The changing role of chemistry in drug discovery: Thomson Reuters: International Year of Chemistry (IYC 2011) report. http://www.thomsonreuters.com/content/science/pdf/ls/iyc2011.pdf Ranjan J: Applications of data mining techniques in the pharmaceutical industry. Technol: J Theor Appl Inf; 2005:61-67. Mattos N: IBM study. 2005. http://news.cnet.com/IBM-dives-deeper-into-corporate-search/2100-7344_3-5820938.html Schneider G: Virtual screening: an endless staircase? Nat Rev Drug Discov 2010, 9: 273-276. 10.1038/nrd3139 Hahn U, Cohen KB, Garten Y, Shah NH: Mining the pharmacogenomics literature: a survey of the state of the art. Brief Bioinform 2012,13(4):460-494. 10.1093/bib/bbs018 Garten Y, Coulet A, Altman RB: Recent progress in automatically extracting information from the pharmacogenomic literature. Pharmacogenomics 2010, 11: 1467-1489. 10.2217/pgs.10.136 Biomarkers market discovery technologies (proteomics, genomics, imaging, bioinformatics), applications (drug discovery, personalized medicine, molecular diagnostics) & indications (cancer, cardiovascular & neural) - global trends & forecasts (2011-2020). http://www.marketsandmarkets.com/Market-Reports/biomarkers-advanced-technologies-and-global-market-43.html Ioannidis JPA, Panagiotou OA: Comparison of effect sizes associated with biomarkers reported in highly cited individual articles and in subsequent meta-analyses. J Am Med Assoc 2011,305(21):2200-2210. 10.1001/jama.2011.713 PubMed http://www.ncbi.nlm.nih.gov/pubmed/ ClinicalTrials.gov http://www.clinicaltrials.gov/ UMLS http://www.nlm.nih.gov/research/umls/ Gene http://www.ncbi.nlm.nih.gov/gene Gene Ontology http://www.geneontology.org/ OMIM http://www.ncbi.nlm.nih.gov/omim Thomson Reuters http://www.thomsonreuters.com Wälti P, Trugenberger CA, Cuypers F, Wälti C: Sprach- und text-vorrichtung und entsprechendes verfahren, Patents EP1779271-B1 and US2007-0282598-A1/US2008-0215313-A1. 2008. Cover TM, Thomas JA: Elements of Information Theory. 2nd edition. Hoboken: John Wiley & Sons; 2006. Kohonen T: Self-Organizing Maps. 3rd edition. Berlin: Springer Verlag; 2001. Fellbaum C: WordNet: An Electronic Lexical Database. Cambridge MA: MIT Press; 1998. Barry JM, Pollard JP, Wachspress EW: A method of parallel iteration. J Comput Appl Math 1989, 28: 119-127. Kullback S, Leibler RA: On information and sufficiency. Ann. Math. Statist 1951,22(1):79-87. 10.1214/aoms/1177729694 Shaw AP: (Program Co-Chair < [email protected]>): Semantic Tech & Business Conference: 26-27 September 2011. Trugenberger CA; 2011. http://semtechbizuk2011.semanticweb.com/index.cfm Späth H: Cluster analysis algorithms for data reduction and classification of objects. Chichester: Ellis Horwood; 1980. Translated by Bull U Translated by Bull U Liu K, Hogan WR, Crowley RS: Natural language processing methods and systems for biomedical ontology learning. J Biomed Inform 2011, 44: 163-179. 10.1016/j.jbi.2010.07.006 Linguamatics I2E http://www.linguamatics.com/welcome/software/I2E.html GO Online SQL Environment http://www.berkeleybop.org/goose/ Type 1 and Type 2 Diabetes. What do they have in Common?. http://diabetes.diabetesjournals.org/content/54/suppl_2/S40.full.pdf Elevated Intact Proinsulin Levels Are Indicative of Beta-Cell Dysfunction, Insulin Resistance and Cardiovascular Risk: Impact of the Antidiabetic Agent Pioglitazone. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3192645/pdf/dst-05-0784.pdf Pakhomov S, Mcinnes BT, Lamba J, Liu Y, Melton GB, Ghodke Y, Bhise N, Lamba V, Birnbaum AK: Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies. J Biomed Inform 2012,45(5):862-869. 10.1016/j.jbi.2012.04.007 Hakenberg J, Voronov D, Nguyen VH, Liang S, Anwar S, Lumpkin B, Learman R, Tari L, Baral C: A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions. J Biomed Inform 2012,45(5):842-850. 10.1016/j.jbi.2012.04.006 Li J, Lu Z: Systematic identification of pharmacogenomics information from clinical trials. J Biomed Inform 2012,45(5):870-878. 10.1016/j.jbi.2012.04.005 Xu R, Wang Q: A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text. J Biomed Inform 2012,45(5):827-834. 10.1016/j.jbi.2012.04.011