A document level neural model integrated domain knowledge for chemical-induced disease relations

BMC Bioinformatics - Tập 19 - Trang 1-12 - 2018
Wei Zheng1,2, Hongfei Lin1, Xiaoxia Liu1, Bo Xu1
1College of Computer Science and Technology, Dalian University of Technology, Dalian, China
2College of Software, Dalian JiaoTong University, Dalian, China

Tóm tắt

The effective combination of texts and knowledge may improve performances of natural language processing tasks. For the recognition of chemical-induced disease (CID) relations which may span sentence boundaries in an article, although existing CID systems explored the utilization for knowledge bases, the effects of different knowledge on the identification of a special CID haven’t been distinguished by these systems. Moreover, systems based on neural network only constructed sentence or mention level models. In this work, we proposed an effective document level neural model integrated domain knowledge to extract CID relations from biomedical articles. Basic semantic information of an article with respect to a special CID candidate pair was learned from the document level sub-network module. Furthermore, knowledge attention depending on the representation of the article was proposed to distinguish the influences of different knowledge on the special CID pair and then the final representation of knowledge was formed by aggregating weighed knowledge. Finally, the integrated representations of texts and knowledge were passed to a softmax classifier to perform the CID recognition. Experimental results on the chemical-disease relation corpus proposed by BioCreative V show that our proposed system integrated knowledge achieves a good overall performance compared with other state-of-the-art systems. Experimental analyses demonstrate that the introduced attention mechanism on domain knowledge plays a significant role in distinguishing influences of different knowledge on the judgment for a special CID relation.

Tài liệu tham khảo

Islamaj Dogan R, Murray GC, Névéol A, Lu Z. Understanding PubMed® user search behavior through log analysis. Database. 2009;2009:bap018. Névéol A, Doğan RI, Lu Z. Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction. J Biomed Inform. 2011;44(2):310–8. Davis AP, Grondin CJ, Lennon-Hopkins K, Saraceni-Richards C, Sciaky D, King BL, Wiegers TC, Mattingly CJ. The comparative Toxicogenomics Database’s 10th year anniversary: update 2015. Nucleic Acids Res. 2014;43(D1):D914–20. Wei C-H, Peng Y, Leaman R, Davis AP, Mattingly CJ, Li J, Wiegers TC, Lu Z. Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task. Database. 2016;2016:baw032. Davis AP, Wiegers TC, Roberts PM, King BL, Lay JM, Lennon-Hopkins K, Sciaky D, Johnson R, Keating H, Greene N. A CTD–Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug–disease and drug–phenotype interactions. Database. 2013;2013:bat080. Li J, Sun Y, Johnson RJ, Sciaky D, Wei C-H, Leaman R, Davis AP, Mattingly CJ, Wiegers TC, Lu Z. BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database. 2016;2016:baw068. Peng Y, Wei C-H, Lu Z. Improving chemical disease relation extraction with rich features and weakly labeled data. J Cheminform. 2016;8(1):53. Pons E, Becker BF, Akhondi SA, Afzal Z, Van Mulligen EM, Kors JA. Extraction of chemical-induced diseases using prior knowledge and textual information. Database. 2016;2016:baw046. Xu J, Wu Y, Zhang Y, Wang J, Lee H-J, Xu H. CD-REST: a system for extracting chemical-induced disease relation in literature. Database. 2016;2016:baw036. Lowe DM, O’Boyle NM, Sayle RA. Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall. Database. 2016;2016:baw039. Alam F, Corazza A, Lavelli A, Zanoli R. A knowledge-poor approach to chemical-disease relation extraction. Database. 2016;2016. Zhou H, Deng H, Chen L, Yang Y, Jia C, Huang D. Exploiting syntactic and semantics information for chemical–disease relation extraction. Database. 2016;2016:baw048. Gu J, Sun F, Qian L, Zhou G. Chemical-induced disease relation extraction via convolutional neural network. Database. 2017;2017(1). https://doi.org/10.1093/database/bax024. Jiang Z, Jin L, Li L, Qin M, Qu C, Zheng J, Huang D. A CRD-WEL system for chemical-disease relations extraction. In: The fifth BioCreative challenge evaluation workshop; 2015. p. 317–26. Li Z, Yang Z, Lin H, Wang J, Gui Y, Zhang Y, Wang L. CIDExtractor: A chemical-induced disease relation extraction system for biomedical literature. In: Bioinformatics and Biomedicine (BIBM), 2016 IEEE International Conference on IEEE. 2016:994–1001. Verga P, Strubell E, Shai O, Mccallum A. Attending to all mention pairs for full abstract biological relation extraction. arXiv preprint arXiv: 171008312. 2017. Li H, Chen Q, Tang B, Wang X. Chemical-induced disease extraction via convolutional neural networks with attention. In: IEEE International Conference on Bioinformatics and Biomedicine; 2017. p. 1276–9. Gu J, Qian L, Zhou G. Chemical-induced disease relation extraction with various linguistic features. Database. 2016;2016:baw042. Fillmore CJ. Frame semantics and the nature of language. Ann N Y Acad Sci. 1976;280(1):20–32. Minsky ML. Society of mind. Columbia: Simon & Schuster; 1985. Ahn S, Choi H, Pärnamaa T, Bengio Y. A neural knowledge language model. arXiv preprint arXiv:1608.00318. 2016. Yang B, Mitchell T. Leveraging knowledge bases in LSTMs for improving machine reading. In: Meeting of the Association for Computational Linguistics; 2017. p. 1436–46. Toutanova K, Chen D, Pantel P, Poon H, Choudhury P, Gamon M. Representing Text for Joint Embedding of Text and Knowledge Bases. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing: 2015. Hao Y, Zhang Y, Liu K, He S, Liu Z, Wu H, Zhao J. An end-to-end model for question answering over Knowledge Base with cross-attention combining global knowledge. In: Meeting of the Association for Computational Linguistics; 2017. p. 221–31. Das R, Zaheer M, Reddy S, Mccallum A. Question answering on knowledge bases and text using Universal Schema and Memory Networks. arXiv preprint arXiv: 1704.08384. 2017:358–365. Luong M-T, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv: 1508.04025. 2015. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv: 1409.0473. 2014. Wei Z, Lin H, Ling L, Zhao Z, Li Z, Zhang Y, Yang Z, Jian W. An attention-based effective neural model for drug-drug interactions extraction. BMC Bioinform. 2017;18(1):445. Wang L, Cao Z, de Melo G, Liu Z. Relation classification via multi-level attention cnns. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics Association for Computational Linguistics. 2016. Zhang Y, Zheng W, Lin H, Wang J, Yang Z, Dumontier M. Drug-drug interaction extraction via hierarchical RNNs on sequence and shortest dependency paths. Bioinformatics. 2017;34(5):828–35. Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D. The stanford corenlp natural language processing toolkit. In: ACL (System Demonstrations); 2014. p. 55–60. Bengio Y, Ducharme R, Vincent P, Jauvin C. A neural probabilistic language model. J Mach Learn Res. 2003;3(Feb):1137–55. Elman JL. Finding structure in time. Cogn Sci. 1990;14(2):179–211. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. J Mach Learn Res. 2011;12(Aug):2493–537. Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning. 2012;4(2):26–31. Zheng W, Lin H, Li Z, et al. An effective neural model extracting document level chemical-induced disease relations from biomedical literature. J Biomed Inform. 2018;83:1–9. Bordes A, Usunier N, Weston J, Yakhnenko O. Translating embeddings for modeling multi-relational data. In: International Conference on Neural Information Processing Systems; 2013. p. 2787–95. Wei C-H, Kao H-Y, Lu Z. PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 2013;41(W1):W518–22. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. Comp Sci. 2013:3111–19. Zeng D, Liu K, Lai S, Zhou G, Zhao J. Relation classification via convolutional deep neural network. In: Proceedings of COLING: 2014. p. 2335–2344. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv: 1207.0580. 2012.