A CRF-based system for recognizing chemical entity mentions (CEMs) in biomedical literature
Tóm tắt
Từ khóa
Tài liệu tham khảo
Krallinger M, Leitner F, Rabal O, Vazquez M, Miguel J, Valencia A: CHEMDNER: The drugs and chemical names extraction challenge. J Cheminform. 2015, 7 (Suppl 1): S1-
Li J, Zhu X, Chen JY: Building disease-specific drug-protein connectivity maps from molecular interaction networks and pubmed abstracts. PLoS Computational Biology. 2009, 5 (7): 1000450-10.1371/journal.pcbi.1000450. doi:10.1371/journal.pcbi.1000450
Eltyeb S, Salim N: Chemical named entities recognition: A review on approaches and applications. Journal of Cheminformatics. 2014, 6 (17): 1-12. doi:10.1186/1758-2946-6-17
Vazquez M, Krallinger M, Leitner F, Valencia A: Text mining for drugs and chemical compound: Methods, tools and applications. Molecular Informatics. 2011, 30 (6-7): 506-519. 10.1002/minf.201100005. doi:10.1002/minf.201100005
Krallinger M, Morgan A, Smith L, Leitner F, Tanabe L, Wilbur J, Hirschman L, Valencia A: Evaluation of text-mining systems for biology: Overview of the second BioCreative community challenge. Genome Biology. 2008, 9 (Suppl 2): 1-10.1186/gb-2008-9-s2-s1. doi:10.1186/gb-2008-9-S2-S1
Xu S, An X, Zhu L, Zhang Y, Zhang H: A CRF-based system for recognizing chemical entities in biomedical literature. Proceedings of the 4th BioCreative Challenge Evaluation Workshop. Edited by: Krallinger M, Leitner F, Rabal O, Vazquez M, Oyarzabal J, Valencia A. 2013, 2: 152-157.
Xu S, Ma F, Tao L: Learn from the information contained in the false splice sites as well as in the true splice sites using SVM. where |CEM| means the number of token components of a CEM. Take "[C(8)mim][PF(6)]" in Table 8 as an Proceedings of the International Conference on Intelligent Systems and Knowledge Engineering. 2007, Atlantis Press, Amsterdam, Netherlands, 1360-1366. doi:10.2991/iske.2007.13
Xu S: Selenoprotein genes prediction in silico based on machine learning approaches. PhD thesis. 2008, China Agricultural University
Mikolov T, Chen K, Corrado G, Dean J: Efficient estimation of word representations in vector space. Proceedings of the International Conference on Learning Representations. 2013
Liang P: Semi-supervised learning for natural language. Master's thesis. 2005, Massachusetts Institute of Technology
Turian J, Ratinov L, Bengio Y: Word representations: A simple and general method for semi-supervised learning. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA. 2010, 384-394.
Lafferty J, McCallum A, Pereira F: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th International Conference on Machine Learning. 2001, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 282-289.
Krallinger M, Rabal O, Leitner F, Vazquez M, Salgado D, Lu Z, Leaman R, Lu Y, Ji D, Lowe DM, Sayle RA, Batista-Navarro RT, Rak R, Huber T, Rocktaschel T, Matos S, Campos D, Tang B, Xu H, Munkhdalai T, Ryu KH, Ramanan SV, Nathan S, Zitnik S, Bajec M, Weber L, Irmer M, Akhondi SA, Kors JA, Xu S, An X, Sikdar UK, Ekbal A, Yoshioka M, Dieb TM, Choi M, Verspoor K, Khabsa M, Giles CL, Liu H, Ravikumar KE, Lamurias A, Couto FM, Dai H, Tsai RT, Ata C, Can T, Usie A, Alves R, Segura-Bedmar I, Martinez P, Oryzabal J, Valencia A: The CHEMDNER corpus of chemicals and drugs and its annotation principles. J Cheminform. 2015, 7 (Suppl 1): S2-
Sha F, Pereira F: Shallow parsing with conidtional random fields. Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Lingustics, Stroudsburg, PA, USA. 2003, 213-220. doi:10.3115/1073445.1073473
Miller S, Guinness J, Zamanian A: Name tagging with word clusters and discriminative training. Proceedings of Conference on Human Language Technology/North American Chapter of the Association for Computational Linguiustics Annual Meeting. 2004, Association for Computational Linguistics, Boston, Massachusetts, 337-342.
Ganchev K, Crammer K, Pereira F, Mann G, Bellare K, McCallum A, Carroll S, Jin Y, White P: Penn/Umass/CHOP BioCreative II systems. Proceedings of the 2nd BioCreative Challenge Evaluation Workshop. 2007, 23: 119-124.
Brown PF, deSouza PV, Mercer RL, Pietra VJD, Lai JC: Class-based n-gram models of natural language. Computational Linguistics. 1992, 18 (4): 467-479.
Finkel JR, Manning CD: Nested named entity recognition. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Association for Computational Lingustics, Stroudsburg, PA, USA. 2009, 141-150.
The Apache OpenNLP Library. [http://opennlp.apache.org/index.html]
Read J, Dridan R, Oepen S, Solberg LJ: Sentence boundary detection: A long solved problem?. Proceedings of the 24nd International Conference on Computational Linguistics. Edited by: Kay M, Boitet C. 2012, Indian Institute of Technology Bombay, Mumbai, Maharashtra, India, 985-994.
Wei C-H, Harris BR, Kao H-Y, Lu Z: tmVar: A text mining approach for extracting sequence variants in biomedical literature. Bioinformatics. 2013, 129 (11): 1433-1439.
McDonald R, Pereira F: Identifying gene and protein mmention in text using conditional random fields. BMC Bioinformatics. 2005, 6 (Suppl 1): 6-10.1186/1471-2105-6-S1-S6. doi:10.1186/1471-2105-6-S1-S6
Huang H-S, Lin Y-S, Lin K-T, Kuo C-J, Chang Y-M, Yang B-H, Chung I-F, Hsu C-N: High-recall gene mention recognition by unification of multiple background parsing models. Proceedings of the 2nd BioCreative Challenge Evaluation Workshop. 2007, 23: 109-111.
Klinger R, Friedrich CM, Fluck J, Hofmann-Apitius M: Named entity recognition with combinations of conditional random fields. Proceedings of the 2nd BioCreative Challenge Evaluation Workshop. Edited by: Hirschmann L, Krallinger M, Valencia A. 2007, 89-92.
Liu DC, Nocedal J: On the limited memory BFGS method for large scale optimization. Mathematical Programming. 1989, 45 (3): 503-528. doi:10.1007/BF01589116
Kudo T: CRF++: Yet Another CRF Toolkit. [http://crfpp.googlecode.com/svn/trunk/doc/index.html]
Manning C, Bauer J: Stanford CoreNLP - A Suite of NLP Tools. [http://nlp.stanford.edu/software/corenlp.shtml]
Collobert R, Weston J: A unified architecture for natural language processing: Deep neural networks with multitask learning. Proceedings of the 25th International Conference on Machine Learning. 2008
Mnih A, Andriy G: A scalable hierarchical distributed language model. Advances in Neural Information Processing Systems 21. Edited by: Koller D, Schuurmans D, Bengio Y, Bottou L. 2009, MIT Press, Cambridge, MA, 1081-1088.
Liang P: C++ Implementation of the Brown Word Clustering Algorithm. [https://github.com/percyliang/brown-cluster]