A systematic review of automatic text summarization for biomedical literature and EHRs

Oxford University Press (OUP) - Tập 28 Số 10 - Trang 2287-2297 - 2021
Mengqian Wang1, Manhua Wang2, Fei Yu3,2, Yue Yang1, Jennifer Bissram3, Javed Mostafa4,1,2
1Carolina Health Informatics Program, University of North Carolina, Chapel Hill, North Carolina, USA
2iSchool, University of North Carolina, Chapel Hill, North Carolina, USA
3Health Sciences Library, University of North Carolina, Chapel Hill, North Carolina, USA
4Biomedical Research Imaging Center, the School of Medicine, University of North Carolina, Chapel Hill, North Carolina, USA

Tóm tắt

AbstractObjectiveBiomedical text summarization helps biomedical information seekers avoid information overload by reducing the length of a document while preserving the contents’ essence. Our systematic review investigates the most recent biomedical text summarization researches on biomedical literature and electronic health records by analyzing their techniques, areas of application, and evaluation methods. We identify gaps and propose potential directions for future research.Materials and MethodsThis review followed the PRISMA methodology and replicated the approaches adopted by the previous systematic review published on the same topic. We searched 4 databases (PubMed, ACM Digital Library, Scopus, and Web of Science) from January 1, 2013 to April 8, 2021. Two reviewers independently screened title, abstract, and full-text for all retrieved articles. The conflicts were resolved by the third reviewer. The data extraction of the included articles was in 5 dimensions: input, purpose, output, method, and evaluation.ResultsFifty-eight out of 7235 retrieved articles met the inclusion criteria. Thirty-nine systems used single-document biomedical research literature as their input, 17 systems were explicitly designed for clinical support, 47 systems generated extractive summaries, and 53 systems adopted hybrid methods combining computational linguistics, machine learning, and statistical approaches. As for the assessment, 51 studies conducted an intrinsic evaluation using predefined metrics.Discussion and ConclusionThis study found that current biomedical text summarization systems have achieved good performance using hybrid methods. Studies on electronic health records summarization have been increasing compared to a previous survey. However, the majority of the works still focus on summarizing literature.

Từ khóa


Tài liệu tham khảo

Stead, 2009, Computational Technology for Effective Health Care: Immediate Steps and Strategic Directions

Christensen, 2008, Instant availability of patient records, but diminished availability of patient information: a multi-method study of GP’s use of electronic patient records, BMC Med Inform Decis Mak, 8, 12, 10.1186/1472-6947-8-12

McDonald, 1976, Protocol-based computer reminders, the quality of care and the non-perfectibility of man, N Engl J Med, 295, 1351, 10.1056/NEJM197612092952405

McDonald, 2014, Use of internist’s free time by ambulatory care electronic medical record systems, JAMA Intern Med, 174, 1860, 10.1001/jamainternmed.2014.4506

Karsh, 2006, A human factors engineering paradigm for patient safety: designing to support the performance of the healthcare professional, Qual Saf Health Care, 15, i59, 10.1136/qshc.2005.015974

Mazur, 2016, Toward a better understanding of task demands, workload, and performance during physician-computer interactions, J Am Med Informatics Assoc, 23, 1113, 10.1093/jamia/ocw016

Torres-Moreno, 2014, Automatic Text Summarization, 10.1002/9781119004752

Moradi, 2019, Text summarization in the biomedical domain, arXiv Prepr. arXiv1908.02285

Allahyari, 2017, Text summarization techniques: a brief survey, arXiv Prepr. arXiv1707.02268

Afantenos, 2005, Summarization from medical documents: a survey, Artif Intell Med, 33, 157, 10.1016/j.artmed.2004.07.017

Mishra, 2014, Text summarization in the biomedical domain: a systematic review of recent research, J Biomed Inform, 52, 457, 10.1016/j.jbi.2014.06.009

Eden, 2011, Finding What Works in Health Care: Standards for Systematic Reviews, 10.17226/13059

McHugh, 2012, Interrater reliability: the kappa statistic, Biochem Med, 22, 276, 10.11613/BM.2012.031

Moher, 2009, Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement, PLoS Med, 6, e1000097, 10.1371/journal.pmed.1000097

Mani, 2001, Automatic Summarization, 10.1075/nlp.3

Shree

Alpaydin, 2020, Introduction to Machine Learning

Rouane, 2019, Combine clustering and frequent itemsets mining to enhance biomedical text summarization, Expert Syst Appl, 135, 362, 10.1016/j.eswa.2019.06.002

Aronson, 2010, An overview of MetaMap: historical perspective and recent advances, J Am Med Informatics Assoc, 10.1136/jamia.2009.002733

Goodwin, Proceedings of the Conference on Empirical Methods in Natural Language Processing;

Deng, 2020

Song, 2020

Scott, 2013, Data-to-text summarisation of patient records: using computer-generated summaries to access patient histories, Patient Educ Couns, 92, 153, 10.1016/j.pec.2013.04.019

Harkema, 2005, 19

Gayathri, 2015, An efficient medical document summarization using sentence feature extraction and ranking, Indian J Sci Technol, 8, 1, 10.17485/ijst/2015/v8i33/71257

Jones, 1995, Evaluating Natural Language Processing Systems

Moradi, 2020, Summarization of biomedical articles using domain-specific word embeddings and graph ranking, J Biomed Inform, 107, 103452, 10.1016/j.jbi.2020.103452

Afzal, 2020, Clinical context–aware biomedical text summarization using deep neural network: model development and validation, J Med Internet Res, 22, e19810, 10.2196/19810

Bhaskoro, 2017:, Extracting important sentences for public health surveillance information from Indonesian medical articles, 1

Bui, 2016, Extractive text summarization system to aid data extraction from full text in systematic review development, J Biomed Inf, 64, 265, 10.1016/j.jbi.2016.10.014

Chiang, 11–14 2014; ,

Cohan, 2018, Scientific document summarization via citation contextualization and scientific discourse, Int J Digit Libr, 19, 287, 10.1007/s00799-017-0216-8

Conroy, 2018, Section mixture models for scientific document summarization, Int J Digit Libr, 19, 305, 10.1007/s00799-017-0218-6

Davoodijam, 2021, MultiGBS: a multi-layer graph approach to biomedical summarization, J Biomed Inform, 116, 103706, 10.1016/j.jbi.2021.103706

Du, 2020, Biomedical-domain pre-trained language model for extractive summarization, Knowledge-Based Syst, 199, 105964, 10.1016/j.knosys.2020.105964

Dudko, 4–6 2017; ,

Gayathri, 2015, Towards an efficient approach for automatic medical document summarization, Cybern Inf Technol, 15, 78

Gigioli, - ., 2338

Goldstein, 2013, 68

Goldstein, 2015, Generation of natural-language textual summaries from longitudinal clinical records, Stud Heal Technol Inf, 216, 594

Goldstein, 2016, An automated knowledge-based textual summarization system for longitudinal, multivariate clinical data, J Biomed Inform, 61, 159, 10.1016/j.jbi.2016.03.022

Goldstein, 2017, Evaluation of an automated knowledge-based textual summarization system for longitudinal clinical data, in the intensive care domain, Artif Intell Med, 82, 20, 10.1016/j.artmed.2017.09.001

Gulden, 2019, Extractive summarization of clinical trial descriptions, Int J Med Inform, 129, 114, 10.1016/j.ijmedinf.2019.05.019

Guo, 2013, Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review, Bioinformatics, 29, 1440, 10.1093/bioinformatics/btt163

Kim, 2018, Personalised health document summarisation exploiting Unified Medical Language System and topic-based clustering for mobile healthcare, J Inf Sci, 44, 619, 10.1177/0165551517722983

Lee, 2020, CERC: an interactive content extraction, recognition, and construction tool for clinical and biomedical text, BMC Med Inform Decis Mak, 20, 1, 10.1186/s12911-020-01330-8

Liu, 2019, Long story short: finding health advice with informative summaries on health social media, Aslib J Inf Manag, 71 (6): 821–40

Lloret, 2013, COMPENDIUM: A text summarization system for generating abstracts of research papers, Data Knowl Eng, 88, 164, 10.1016/j.datak.2013.08.005

Malakasiotis, 2015, CLEF (Working Notes)

Mitrović, 2015, Summarizing Citation Contexts of Scientific Publications. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics), 154, 10.1007/978-3-319-24027-5_13

Moen, 2016, Comparison of automatic summarisation methods for clinical free text notes, Artif Intell Med, 67, 25, 10.1016/j.artmed.2016.01.003

Moradi, 2017, Quantifying the informativeness for biomedical literature summarization: an itemset mining method, Comput Methods Programs Biomed, 146, 77, 10.1016/j.cmpb.2017.05.011

Moradi, , CIBS: a biomedical text summarizer using topic-based sentence clustering, 88, 53

Moradi, 2018, 135

Moradi, 2018, Different approaches for identifying important concepts in probabilistic biomedical text summarization, Artif Intell Med, 84, 101, 10.1016/j.artmed.2017.11.004

Moradi, 2020, Deep contextualized embeddings for quantifying the informative content in biomedical text summarization, Comput Methods Programs Biomed, 184, 105117, 10.1016/j.cmpb.2019.105117

Nasr Azadani, 305

Nasr Azadani, 2018, Graph-based biomedical text summarization: an itemset mining and sentence clustering approach, J Biomed Inform, 84, 42, 10.1016/j.jbi.2018.06.005

Nguyen, 2013

Parveen, 2015, 1298

Parveen, 2015, 1949

Polepalli Ramesh, 2015, Figure-associated text summarization and evaluation, PLoS One, 10, e0115671, 10.1371/journal.pone.0115671

Puyana, 2013:, 280

Rouane, 2020, A New Biomedical Text Summarization Method Based on Sentence Clustering and Frequent Itemsets Mining, 144

Rouane, 2020

Sarker, 2013, 295

Sarker, 2016, Query-oriented evidence extraction to support evidence-based medicine practice, J Biomed Inform, 59, 169, 10.1016/j.jbi.2015.11.010

Sibunruang, 2018

Siranjeevi, 2020

Sotudeh, Attend to medical ontologies: content selection for clinical abstractive summarization, In: arXiv. 2020: 1899–905. doi:10.18653/v1/2020.acl-main.172.

Suominen, 2013, 89

Ting, 2013, Web information retrieval for health professionals, J Med Syst, 37, 9946, 10.1007/s10916-013-9946-3

Villa-Monte, 2019, User-oriented summaries using a PSO based scoring optimization method, Entropy, 21, 617, 10.3390/e21060617

Villa-Monte, 2020, Document summarization using a structural metrics based representation, J Intell Fuzzy Syst, 38, 5579, 10.3233/JIFS-179648

Xu, Generating User-Oriented Text Summarization Based on Social Networks Using Topic Models, 186

Yin, 2014, HealthQA: A Chinese QA summary system for smart health, LNCS, 8549, 51

Erkan, 2004, LexRank: Graph-based lexical centrality as salience in text summarization, J Artif Intell Res, 22, 457, 10.1613/jair.1523

Mihalcea, TextRank: Bringing order into texts

Radev, 2004, Centroid-based summarization of multiple documents, Inf Process Manag, 40 (6): 919–38

Rindflesch, 2003, The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text, J Biomed Inform, 10.1016/j.jbi.2003.11.003

Rush, 2015, Neural attention model for abstractive sentence summarization, 10.18653/v1/D15-1044

Mollá, 2016, A corpus for research in text processing for evidence based medicine, 50, 705

Geng, 2020, Semantic relation extraction using sequential and tree-structured LSTM with attention, Inf Sci (Ny), 509, 183, 10.1016/j.ins.2019.09.006

Johnson, 2016, MIMIC-III, a freely accessible critical care database, Sci Data, 3, 1, 10.1038/sdata.2016.35

Pivovarov, . ; 2016. :10.7916/8906.