Stacked ensemble combined with fuzzy matching for biomedical named entity recognition of diseases

Journal of Biomedical Informatics - Tập 64 - Trang 1-9 - 2016
Balu Bhasuran1, Gurusamy Murugesan2, Sabenabanu Abdulkadhar2, Jeyakumar Natarajan1,2
1DRDO-BU Center for Life Sciences, Bharathiar University Campus, Coimbatore 641046, India
2Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore 641046, India

Tài liệu tham khảo

Zhu, 2013, Biomedical text mining and its applications in cancer research, J. Biomed. Inform., 46, 200, 10.1016/j.jbi.2012.10.007 Cohen, 2005, A survey of current work in biomedical text mining, Briefings Bioinform., 6, 57, 10.1093/bib/6.1.57 Lin, 2004, A maximum entropy approach to biomedical named entity recognition, 56 Jimeno, 2008, Assessment of disease named entity recognition on a corpus of annotated sentences, BMC Bioinform., 9, S3, 10.1186/1471-2105-9-S3-S3 J. Lafferty, A. McCallum, F.C. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, 2001. Jonnalagadda, 2013, Using empirically constructed lexical resources for named entity recognition, Biomed. Inform. Insights, 6, 17 Collier, 2000, Extracting the names of genes and gene products with a hidden Markov model, vol. 1, 201 McCallum, 2000, Maximum entropy Markov models for information extraction and segmentation, vol. 17, 591 Neves, 2010, Moara: a Java library for extracting and normalizing gene and protein mentions, BMC Bioinform., 11, 157, 10.1186/1471-2105-11-157 Krauthammer, 2004, Term identification in the biomedical literature, J. Biomed. Inform., 37, 512, 10.1016/j.jbi.2004.08.004 Campos, 2013, A modular framework for biomedical concept recognition, BMC Bioinform., 14, 281, 10.1186/1471-2105-14-281 Leaman, 2015, TmChem: a high performance approach for chemical named entity recognition and normalization, J. Cheminform., 7 Huang, 2013, Disease named entity recognition by machine learning using semantic type of metathesaurus, Int. J. Mach. Learn. Comput., 3, 494, 10.7763/IJMLC.2013.V3.367 Korkontzelos, 2015, Boosting drug named entity recognition using an aggregate classifier, Artif. Intell. Med., 65, 145, 10.1016/j.artmed.2015.05.007 Ekbal, 2013, Biomedical named entity extraction: some issues of corpus compatibilities, SpringerPlus, 2, 601, 10.1186/2193-1801-2-601 Li, 2012, Disease mention recognition using soft-margin SVM, Training, 593, 5 Doğan, 2012, An improved corpus of disease mentions in PubMed citations Karp, 1987, Efficient randomized pattern-matching algorithms, IBM J. Res. Dev., 31, 249, 10.1147/rd.312.0249 Boyer, 1977, A fast string searching algorithm, Commun. ACM, 20, 762, 10.1145/359842.359859 Doğan, 2014, NCBI disease corpus: a resource for disease name recognition and concept normalization, J. Biomed. Inform., 47, 1, 10.1016/j.jbi.2013.12.006 Wei, 2015, Overview of the BioCreative V chemical disease relation (CDR) task Toutanova, 2000, Enriching the knowledge sources used in a maximum entropy part-of-speech tagger, vol. 13, 63 Lipscomb, 2000, Medical subject headings (MeSH), Bull. Med. Libr. Assoc., 88, 265 Hewett, 2002, PharmGKB: the pharmacogenetics knowledge base, Nucl. Acids Res., 30, 163, 10.1093/nar/30.1.163 Bodenreider, 2004, The unified medical language system (UMLS): integrating biomedical terminology, Nucl. Acids Res., 32, D267, 10.1093/nar/gkh061 Osborne, 2009, Annotating the human genome with Disease Ontology, BMC Genom., 10, S6, 10.1186/1471-2164-10-S1-S6 Elkin, 2006, Evaluation of the content coverage of SNOMED CT: ability of SNOMED clinical terms to represent clinical problem lists, vol. 81, no. 6, 741 Davis, 2012, MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database, Database, bar065 Hamosh, 2005, Online Mendelian Inheritance in Man (OMIM) a knowledgebase of human genes and genetic disorders, Nucl. Acids Res., 33, D514 Nadeau, 2007, A survey of named entity recognition and classification, Lingvisticae Investigationes, 30, 3, 10.1075/li.30.1.03nad John Lafferty, Andrew McCallum, Fernando C.N. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, 2001. Munkhdalai, 2015, Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations, J. Cheminform., 7, S9, 10.1186/1758-2946-7-S1-S9 Lavergne, 2010, Practical very large scale CRFs Bergstra, 2012, Random search for hyper-parameter optimization, J. Mach. Learn. Res., 13, 281 Yang, 2008, Exploiting the contextual cues for bio-entity name recognition in biomedical literature, J. Biomed. Inform., 41, 580, 10.1016/j.jbi.2008.01.002 Ekbal, 2011, Weighted vote-based classifier ensemble for named entity recognition: a genetic algorithm-based approach, ACM Trans. Asian Lang. Inform. Process. (TALIP), 10, 9, 10.1145/1967293.1967296 H. Wang, T. Zhao, Identifying named entities in biomedical text based on stacked generalization, in: 7th World Congress on Intelligent Control and Automation. WCICA 2008, IEEE, 2008, pp. 160–164. Dasarathy, 1979, A composite classifier system design: concepts and methodology, Proc. IEEE, 67, 708, 10.1109/PROC.1979.11321 Zhou, 2002, Ensembling neural networks: many could be better than all, Artif. Intell., 137, 239, 10.1016/S0004-3702(02)00190-X Zhou, 2005, Recognition of protein/gene names from text using an ensemble of classifiers, BMC Bioinform., 6, 1, 10.1186/1471-2105-6-1 Wolpert, 1992, Stacked generalization, Neural Networks, 5, 241, 10.1016/S0893-6080(05)80023-1 P.P. Bonissone, The Problem of Linguistic Approximation in System Analysis, 1979. Eshragh, 1979, A general approach to linguistic approximation, Int. J. Man Mach. Stud., 11, 501, 10.1016/S0020-7373(79)80040-1 Wenstøp, 1980, Quantitative analysis with linguistic values, Fuzzy Sets Syst., 4, 99, 10.1016/0165-0114(80)90031-7 Zwick, 1987, Measures of similarity among fuzzy concepts: a comparative analysis, Int. J. Approx. Reason., 1, 221, 10.1016/0888-613X(87)90015-6 Wei, 2013, PubTator: a web-based text mining tool for assisting biocuration, Nucl. Acids Res., gkt441