Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
Tóm tắt
Shared tasks and community challenges represent key instruments to promote research, collaboration and determine the state of the art of biomedical and chemical text mining technologies. Traditionally, such tasks relied on the comparison of automatically generated results against a so-called Gold Standard dataset of manually labelled textual data, regardless of efficiency and robustness of the underlying implementations. Due to the rapid growth of unstructured data collections, including patent databases and particularly the scientific literature, there is a pressing need to generate, assess and expose robust big data text mining solutions to semantically enrich documents in real time. To address this pressing need, a novel track called “Technical interoperability and performance of annotation servers” was launched under the umbrella of the BioCreative text mining evaluation effort. The aim of this track was to enable the continuous assessment of technical aspects of text annotation web servers, specifically of online biomedical named entity recognition systems of interest for medicinal chemistry applications. A total of 15 out of 26 registered teams successfully implemented online annotation servers. They returned predictions during a two-month period in predefined formats and were evaluated through the BeCalm evaluation platform, specifically developed for this track. The track encompassed three levels of evaluation, i.e. data format considerations, technical metrics and functional specifications. Participating annotation servers were implemented in seven different programming languages and covered 12 general entity types. The continuous evaluation of server responses accounted for testing periods of low activity and moderate to high activity, encompassing overall 4,092,502 requests from three different document provider settings. The median response time was below 3.74 s, with a median of 10 annotations/document. Most of the servers showed great reliability and stability, being able to process over 100,000 requests in a 5-day period. The presented track was a novel experimental task that systematically evaluated the technical performance aspects of online entity recognition systems. It raised the interest of a significant number of participants. Future editions of the competition will address the ability to process documents in bulk as well as to annotate full-text documents.
Tài liệu tham khảo
Krallinger M, Rabal O, Lourenço A et al (2017) Information retrieval and text mining technologies for chemistry. Chem Rev 117:7673–7761. https://doi.org/10.1021/acs.chemrev.6b00851
Huang C-C, Lu Z (2016) Community challenges in biomedical text mining over 10 years: success, failure and the future. Brief Bioinform 17:132–144. https://doi.org/10.1093/bib/bbv024
Arighi CN, Roberts PM, Agarwal S et al (2011) BioCreative III interactive task: an overview. BMC Bioinform 12:S4. https://doi.org/10.1186/1471-2105-12-S8-S4
Hirschman L, Fort K, Boué S et al (2016) Crowdsourcing and curation: perspectives from biology and natural language processing. Database (Oxford). https://doi.org/10.1093/database/baw115
Rebholz-Schuhmann D, Yepes AJJ, Van Mulligen EM et al (2010) CALBC silver standard corpus. J Bioinform Comput Biol 08:163–179. https://doi.org/10.1142/S0219720010004562
Rangel F, Rosso P, Montes-Y-Gómez M, et al (2018) Overview of the 6th author profiling task at PAN 2018: multimodal gender identification in Twitter
CodaLab (2017). http://codalab.org/. Accessed 2 Jan 2019
Gollub T, Stein B, Burrows S, Hoppe D (2012) TIRA: configuring, executing, and disseminating information retrieval experiments. In: 2012 23rd international workshop on database and expert systems applications. IEEE, pp 151–155
Smith L, Tanabe LK, nee Ando RJ et al (2008) Overview of BioCreative II gene mention recognition. Genome Biol 9(Suppl 2):S2. https://doi.org/10.1186/gb-2008-9-s2-s2
Krallinger M, Leitner F, Rabal O et al (2015) CHEMDNER: the drugs and chemical names extraction challenge. J Cheminform 7:S1. https://doi.org/10.1186/1758-2946-7-S1-S1
Neves M (2014) An analysis on the entity annotations in biological corpora. F1000Research 3:96. https://doi.org/10.12688/f1000research.3216.1
Krallinger M, Leitner F, Rodriguez-Penagos C, Valencia A (2008) Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biol 9(Suppl 2):S4. https://doi.org/10.1186/gb-2008-9-s2-s4
Katayama T, Arakawa K, Nakao M et al (2010) The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. J Biomed Semant 1:8. https://doi.org/10.1186/2041-1480-1-8
Neerincx PBT, Leunissen JAM (2005) Evolution of web services in bioinformatics. Brief Bioinform 6:178–188
Kim S, Islamaj Doğan R, Chatr-Aryamontri A et al (2016) BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID. Database (Oxford). https://doi.org/10.1093/database/baw121
Kano Y, Baumgartner WA, McCrohon L et al (2009) U-Compare: share and compare text mining tools with UIMA. Bioinformatics 25:1997–1998. https://doi.org/10.1093/bioinformatics/btp289
Krallinger M, Vazquez M, Leitner F et al (2011) The protein–protein interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text. BMC Bioinform 12(Suppl 8):S3. https://doi.org/10.1186/1471-2105-12-S8-S3
Krallinger M, Morgan A, Smith L et al (2008) Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge. Genome Biol 9(Suppl 2):S1. https://doi.org/10.1186/gb-2008-9-s2-s1
Wiegers TC, Davis AP, Mattingly CJ (2014) Web services-based text-mining demonstrates broad impacts for interoperability and process simplification. Database. https://doi.org/10.1093/database/bau050
Wei C-H, Peng Y, Leaman R et al (2016) Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task. Database (Oxford). https://doi.org/10.1093/database/baw032
Leitner F, Krallinger M, Rodriguez-Penagos C et al (2008) Introducing meta-services for biomedical information extraction. Genome Biol 9(Suppl 2):S6. https://doi.org/10.1186/gb-2008-9-s2-s6
Leitner F, Krallinger M, Alfonso V (2013) BioCreative meta-server and text-mining interoperability standard. In: Dubitzky W, Wolkenhauer O, Cho KH, Yokota H (eds) Encyclopedia of systems biology. Springer, New York, pp 106–110
Rabal O, Pérez-Pérez M, Pérez-Rodríguez G et al (2018) Comparative assessment of named entity recognition strategies on medicinal chemistry patents for systems pharmacology. J Cheminform 2018:11–18
BeCalm. http://www.becalm.eu/. Accessed 17 Oct 2018
Iglesias M (2011) CakePHP 1.3 application development cookbook : over 60 great recipes for developing, maintaing, and deploying web applications. Packt Publishing Ltd, Birmingham
Oracle–Java. https://www.oracle.com/java/. Accessed 17 Oct 2018
HTML 5.2. https://www.w3.org/TR/html5/. Accessed 17 Oct 2018
CSS3—All you ever needed to know about CSS3. http://www.css3.info/. Accessed 17 Oct 2018
jQuery. http://jquery.com/. Accessed 17 Oct 2018
Massé M (2012) REST API design rulebook. O’Reilly, Sebastopol
Hibernate. http://hibernate.org/. Accessed 17 Oct 2018
Comeau DC, Islamaj Doğan R, Ciccarese P et al (2013) BioC: a minimalist approach to interoperability for biomedical text processing. Database (Oxford). https://doi.org/10.1093/database/bat064
OpenMinTeD. http://openminted.eu/. Accessed 17 Oct 2018
Rabal O, Pérez-Pérez M, Pérez-Rodríguez G et al (2019) Comparative assessment of named entity recognition strategies on medicinal chemistry patents for systems pharmacology. J Cheminform (Under revision)
Torell W, Avelar V (2004) Mean time between failure: explanation and standards
Lienig J, Bruemmer H (2017) Reliability analysis. In: Fundamentals of electronic systems design. Springer, Cham, pp 45–73
Wynn R, Oyeyemi SO, Johnsen J-AK, Gabarron E (2017) Tweets are not always supportive of patients with mental disorders. Int J Integr Care 17:149. https://doi.org/10.5334/ijic.3261
Kirschnick J, Thomas P, Roller R, Hennig L (2018) SIA: a scalable interoperable annotation server for biomedical named entities. J Cheminform 10:63. https://doi.org/10.1186/s13321-018-0319-2
Dai H-J, Rosa MAC dela, Zhang D et al (2017) NTTMU-SCHEMA BeCalm API in BioCreative V. 5. In: Proceedings of the BioCreative V.5 challenge evaluation workshop, Barcelona, pp 196–204
Wang C-K, Dai H-J, Chang N-W (2017) Micro-RNA recognition in patents in BioCreative V.5. In: Proceedings of the BioCreative V.5 challenge evaluation workshop, Barcelona, pp 205–210
Jonnagaddala J, Dai H-J, Wang C-K, Lai P-T (2017) Performance and interoperability assessment of Disease Extract Annotation Server (DEAS). In: Proceedings of the BioCreative V.5 challenge evaluation workshop, Barcelona, pp 156–162
Jensen LJ (2017) Tagger: BeCalm API for rapid named entity recognition. In: Proceedings of the BioCreative V.5 challenge evaluation workshop, Barcelona, pp 122–129
Pletscher-Frankild S, Jensen LJ (2019) Design, implementation, and operation of a rapid, robust named entity recognition web service. J Cheminform 11:19. https://doi.org/10.1186/s13321-019-0344-9
Santos A, Matos S (2017) Neji : DIY web services for biomedical concept recognition. In: Proceedings of the BioCreative V.5 challenge evaluation workshop, Barcelona, pp 54–60
Matos S (2018) Configurable web-services for biomedical document annotation. J Cheminform 10:68. https://doi.org/10.1186/s13321-018-0317-4
Couto FM, Campos L, Lamurias A (2017) MER: a minimal named-entity recognition tagger and annotation server. In: Proceedings of the BioCreative V.5 challenge evaluation workshop, Barcelona, pp 130–137
Couto FM, Lamurias A (2018) MER: a shell script and annotation server for minimal named entity recognition and linking. J Cheminform 10:58. https://doi.org/10.1186/s13321-018-0312-9
Folkerts H, Neves M (2017) Olelo’s named-entity recognition web servicein the BeCalm TIPS task. In: Proceedings of the BioCreative V.5 challenge evaluation workshop, Barcelona, pp 167–174
Furrer L, Rinaldi F (2017) OGER: OntoGene’s entity recogniser in the BeCalm TIPS task. In: Proceedings of the BioCreative V.5 challenge evaluation workshop, Barcelona, pp 175–182
Furrer L, Jancso A, Colic N, Rinaldi F (2019) OGER++: hybrid multi-type entity recognition. J Cheminform 11:7. https://doi.org/10.1186/s13321-018-0326-3
Hemati W, Uslu T, Mehler A (2017) TextImager as an interface to BeCalm. In: Proceedings of the BioCreative V.5 challenge evaluation workshop, Barcelona, pp 163–166
Teng R, Verspoor K (2017) READ-Biomed-Server : a scalable annotation server using the UIMA concept mapper. In: Proceedings of the BioCreative V.5 challenge evaluation workshop, Barcelona, pp 183–190
Madrid MA, Valencia A (2017) High-throughput, interoperability and benchmarking of text-mining with BeCalm biomedical metaserver. In: Proceedings of the BioCreative V.5 challenge evaluation workshop, Barcelona, pp 146–155
Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–29. https://doi.org/10.1038/75556
Bairoch A (2018) The cellosaurus, a cell-line knowledge resource. J Biomol Technol 29:25–38. https://doi.org/10.7171/jbt.18-2902-002
Griffiths-Jones S, Grocock RJ, van Dongen S et al (2006) miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34:D140–D144. https://doi.org/10.1093/nar/gkj112
Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32:D267–D270. https://doi.org/10.1093/nar/gkh061
Hastings J, Owen G, Dekker A et al (2016) ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res 44:D1214–D1219. https://doi.org/10.1093/nar/gkv1031
Gaulton A, Bellis LJ, Bento AP et al (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:D1100–D1107. https://doi.org/10.1093/nar/gkr777