Section level search functionality in Europe PMC

Journal of Biomedical Semantics - Tập 6 - Trang 1-5 - 2015
Şenay Kafkas1, Xingjun Pi1, Nikos Marinos1, Francesco Talo’1, Andrew Morrison1, Johanna R McEntyre1
1European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, United Kingdom

Tóm tắt

As the availability of open access full text research articles increases, so does the need for sophisticated search services that make the most of this new content. Here, we present a new feature available in Europe PMC that allows selected sections of full text articles to be searched, including figures and reference lists. Users can now search particular parts of an article, reducing noise and allowing fine-tuning of searches. To the best of our knowledge, Europe PMC is the first service that provides a granular literature search by allowing users to target their search to particular sections of articles. This new functionality is based on a heuristic algorithm that identifies and categorises article sections into 17 pre-defined categories based on the section heading. The tagger’s performance is measured against a manually curated dataset consisting of 100 full text articles with an F-score of 98.02%. The section search is available from the advanced search within Europe PMC ( http://europepmc.org ). The source code is freely available from http://europepmc.org/ftp/oa/SectionTagger/ .

Tài liệu tham khảo

Sollaci LB, Pereira MG. The introduction, methods, results and discussion (IMPAD) structure: a fifty-year survey. J Med Library Assoc. 2004;92(3):364–71. McEntyre JR, Ananiadou S, Andrews S, Black WJ, Boulderstone R, Buttery P, et al. UKPMC: a full text article resource for the life sciences. Nucleic Acids Res. 2011, 39:d58–65. http://dx.doi.org/10.1093/nar/gkq1063. Piantadosi ST. Zipf's law in natural language: a critical review and future directions. Psychonomic Bull Rev. 2014, doi: 10.3758/s13423-014-0585-6. Agarwal S, Yu H. Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion. Bioinformatics. 2009;25(23):3174–80. Varga A, Preotiuc-Pietro D, Ciravegna F. Unsupervised document zone identification using probabilistic graphical models. In: 8th International Conference on Language Resources and Evaluation (LREC’12), Europen Language Resources Associations (ELRA), Istanbul, Turkey; 2012. Liakata M, Saha S, Dobnik S, Batchelor C, Rebholz-Schuhmann D. Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics. 2012;28(7):991–1000. Denny JC, Spickard A, Johnson KB, Peterson NB, Peterson JF, Miller RA. Evaluation of a method to identify and categorize section headers in clinical documents. J Am Med Inform Assoc. 2009;16(6):806–15. Tepper M, Capurro D, Xia F, Vanderwende L, Yetisgen-Yildiz M. Statistical Section Segmentation in Free-Text Clinical Records. In: 8th International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey; 2012. Hearst MA, Divoli A, Guturu H, Ksikes A, Nakov P, Wooldridge MA, et al. BioText Search Engine: beyond abstract search. Bioinformatics. 2007;23(16):2196–7. Xu S, McCusker J, Krauthammer M. Yale Image Finder (YIF): a new search engine for retrieving biomedical images. Bioinformatics. 2008;24(17):1968–70.