Development and validation of deep learning and BERT models for classification of lung cancer radiology reports

Informatics in Medicine Unlocked - Tập 40 - Trang 101294 - 2023
S. Mithun1,2,3, Ashish Kumar Jha1,2,3, Umesh B. Sherkhane1,2, Vinay Jaiswar2, Nilendu C. Purandare2,3, V. Rangarajan2,3, A. Dekker1, Sander Puts1, Inigo Bermejo1, L. Wee1
1Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands
2Department of Nuclear Medicine and Molecular Imaging, Tata Memorial Hospital, Mumbai, India
3Homi Bhabha National Institute (HBNI), Deemed University, Mumbai, India

Tài liệu tham khảo

Martin, 2011, Semantic web may Be cancer information's next step forward, JNCI Journal of the National Cancer Institute, 103, 1215, 10.1093/jnci/djr321 Dash, 2019, Big data in healthcare: management, analysis and future prospects, J Big Data, 6, 54, 10.1186/s40537-019-0217-0 Bray, 2018, Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries, CA A Cancer J Clin, 68, 394, 10.3322/caac.21492 Ma, 2006, Global burden of cancer, Yale J Biol Med, 79, 85 Ehrlinger Jensen, 2017, Analysis of free text in electronic health records for identification of cancer patient trajectories, Sci Rep, 7, 10.1038/srep46226 Liu Yim, 2016, Natural Language processing in oncology: a review, JAMA Oncol, 2, 797, 10.1001/jamaoncol.2016.0213 Pons, 2016, Natural Language processing in radiology: a systematic review, Radiology, 279, 329, 10.1148/radiol.16142770 O'Connor, 2013, Simple cyst–appearing renal masses at unenhanced CT: can they Be presumed to Be benign?, Radiology, 269, 793, 10.1148/radiol.13122633 O'Connor, 2013, Simple cyst–appearing renal masses at unenhanced CT: can they Be presumed to Be benign?, Radiology, 269, 793, 10.1148/radiol.13122633 Dublin, 2013, Natural Language Processing to identify pneumonia from radiology reports: NLP for pneumonia, Pharmacoepidemiol Drug Saf, 22, 834, 10.1002/pds.3418 Hripcsak, 1995, Unlocking clinical data from narrative reports: a study of natural Language Processing, Ann Intern Med, 122, 681, 10.7326/0003-4819-122-9-199505010-00007 Danforth, 2012, Automated identification of patients with pulmonary nodules in an integrated health system using administrative health plan data, radiology reports, and natural Language Processing, J Thorac Oncol, 7, 1257, 10.1097/JTO.0b013e31825bd9f5 Esuli, 2013, An enhanced CRFs-based system for information extraction from radiology reports, J Biomed Inf, 46, 425, 10.1016/j.jbi.2013.01.006 Zopf, 2012, Development of automated detection of radiology reports citing adrenal findings, J Digit Imag, 25, 43, 10.1007/s10278-011-9425-7 Trick, 2003, Electronic interpretation of chest radiograph reports to detect central venous catheters, Infect Control Hosp Epidemiol, 24, 950, 10.1086/502165 Solt, 2009, Semantic classification of diseases in discharge summaries using a context-aware rule-based classifier, J Am Med Inf Assoc, 16, 580, 10.1197/jamia.M3087 Percha, 2012, Automatic classification of mammography reports by BI-RADS breast tissue composition class, J Am Med Inf Assoc, 19, 913, 10.1136/amiajnl-2011-000607 Zhou, 2014, Automated classification of radiology reports to facilitate retrospective study in radiology, J Digit Imag, 27, 730, 10.1007/s10278-014-9708-x Yu, 2014, Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing, J Biomed Inf, 52, 386, 10.1016/j.jbi.2014.08.001 Petkov, 2013, Automated determination of metastases in unstructured radiology reports for eligibility screening in oncology clinical trials, Exp Biol Med, 238, 1370, 10.1177/1535370213508172 Garla, 2011, The Yale cTAKES extensions for document classification: architecture and application, J Am Med Inf Assoc, 18, 614, 10.1136/amiajnl-2011-000093 Pestian, 2007, A shared task involving multi-label classification of clinical free text, 97 Mamlin, 2003, Automated extraction and normalization of findings from cancer-related free-text radiology reports, AMIA Annu Symp Proc, 2003, 420 Schuemie, 2012, Automating classification of free-text electronic health records for epidemiological studies: classification of free-text health records, Pharmacoepidemiol Drug Saf, 21, 651, 10.1002/pds.3205 Sohn, 2013, Identifying abdominal aortic aneurysm cases and controls using natural Language Processing of radiology reports, AMIA Jt Summits Transl Sci Proc, 2013, 249 Savova, 2010, Discovering peripheral arterial disease cases from radiology notes using natural Language Processing, AMIA Annu Symp Proc, 2010, 722 Lacson, 2012, Information from searching content with an ontology-utilizing toolkit (iSCOUT), J Digit Imag, 25, 512, 10.1007/s10278-012-9463-9 Rubin, 2010, Natural Language processing for lines and devices in portable chest X-rays, AMIA Annu Symp Proc, 2010, 692 Flynn, 2010, Automated data capture from free-text radiology reports to enhance accuracy of hospital inpatient stroke codes: radiology reports to enhance stroke codes, Pharmacoepidemiol Drug Saf, 19, 843, 10.1002/pds.1981 Friedlin, 2006, A natural Language Processing system to extract and code concepts relating to congestive heart failure from chest radiology reports, AMIA Annu Symp Proc, 2006, 269 Do, 2010, Informatics in radiology: radtf: a semantic search–enabled, natural language processor–generated radiology teaching file, Radiographics, 30, 2039, 10.1148/rg.307105083 Mikolov Weston, 2011, WSABIE: scaling up to large vocabulary image annotation, 2764 Socher, 2011, Parsing natural scenes and natural language with recursive neural networks, 129 Turney, 2010, From frequency to meaning: vector space models of semantics, Jair, 37, 141, 10.1613/jair.2934 Cambria, 2017, Sentiment analysis is a Big suitcase, IEEE Intell Syst, 32, 74, 10.1109/MIS.2017.4531228 Glorot, 2011, Domain adaptation for large-scale sentiment classification: a deep learning approach, 513 Hermann, 2013, The role of syntax in vector space models of compositional semantics, 894 Elman, 1991, Distributed representations, simple recurrent networks, and grammatical structure, Mach Learn, 7, 195, 10.1007/BF00114844 Ma, 2016, Label embedding for zero-shot fine-grained named entity typing, 171 Chen, 2018, Deep learning to classify radiology free-text reports, Radiology, 286, 845, 10.1148/radiol.2017171115 Collobert, 2011, Natural Language processing (almost) from scratch, J Mach Learn Res, 12, 2493 Collobert, 2008, A unified architecture for natural language processing: deep neural networks with multitask learning, 160 Kalchbrenner, 2014, A convolutional neural network for modelling sentences, 655 Kim, 2014, Convolutional neural networks for sentence classification, 1746 Ruder, 2016, INSIGHT-1 at SemEval-2016 task 5: deep learning for multilingual aspect-based sentiment analysis, 330 Shen, 2014, A latent semantic model with convolutional-pooling structure for information retrieval Young, 2018, Recent trends in deep learning based natural Language Processing [review article], IEEE Comput Intell Mag, 13, 55, 10.1109/MCI.2018.2840738 Elman, 1990, Finding structure in time, Cognit Sci, 14, 179, 10.1207/s15516709cog1402_1 Hochreiter, 1997, Long short-term memory, Neural Comput, 9, 1735, 10.1162/neco.1997.9.8.1735 Gers, 2000, Learning to forget: continual prediction with LSTM, Neural Comput, 12, 2451, 10.1162/089976600300015015 Cho, 2014, Learning phrase representations using RNN encoder–decoder for statistical machine translation, 1724 Shin, 2017, Classification of radiology reports using neural attention models, 4363 LeCun, 2015, Deep learning, Nature, 521, 436, 10.1038/nature14539 Ruder, 2022, NLP-progress Lample, 2016, Neural architectures for named entity recognition, 260 Sutskever, 2014, Sequence to sequence learning with neural networks, 3104 Bahdanau Putelli, 2020, Deep learning for classification of radiology reports with a hierarchical schema, Procedia Comput Sci, 176, 349, 10.1016/j.procs.2020.08.045 Dahl, 2021, Neural classification of Norwegian radiology reports: using NLP to detect findings in CT-scans of children, BMC Med Inf Decis Making, 21, 84, 10.1186/s12911-021-01451-8 Vaswani, 2017, vol. 30 Devlin, 2019, BERT: pre-training of deep bidirectional transformers for language understanding, 4171 Yang, 2020, Clinical concept extraction using transformers, J Am Med Inf Assoc, 27, 1935, 10.1093/jamia/ocaa189 Si, 2019, Enhancing clinical concept extraction with contextual embeddings, J Am Med Inf Assoc, 26, 1297, 10.1093/jamia/ocz096 Yang, 2020, Extracting family history of patients from clinical narratives: exploring an end-to-end solution with deep learning models, JMIR Med Inform, 8, 10.2196/22982 Jha, 2022, Implementation of Big imaging data pipeline adhering to FAIR principles for federated machine learning in oncology, IEEE Trans Radiat Plasma Med Sci, 6, 207, 10.1109/TRPMS.2021.3113860 Mithun, 2023, Clinical concept-based radiology reports classification pipeline for lung carcinoma, J Digit Imag, 10.1007/s10278-023-00787-z Johnson Johnson, 2016, MIMIC-III, a freely accessible critical care database, Sci Data, 3, 10.1038/sdata.2016.35 Goldberger, 2000, PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals, Circulation, 101, 10.1161/01.CIR.101.23.e215 Srivastava, 2014, Dropout: a simple way to prevent neural networks from overfitting, J Mach Learn Res, 15, 1929 Wager, 2013, Dropout training as adaptive regularization, 351 Dahl, 2013, Improving deep neural networks for LVCSR using rectified linear units and dropout, 8609 Semeniuta, 2016, Recurrent dropout without memory loss, 1757 Team Sterbak Youden, 1950, Index for rating diagnostic tests, Cancer, 3, 32, 10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3 Stanfill, 2010, A systematic literature review of automated clinical coding and classification systems, J Am Med Inf Assoc, 17, 646, 10.1136/jamia.2009.001024 Uzuner, 2010, i2b2/VA challenge on concepts, assertions, and relations in clinical text, Journal of the American Medical Informatics Association 2011, 18, 552, 10.1136/amiajnl-2011-000203 Sun, 2013, Evaluating temporal relations in clinical text: 2012 i2b2 Challenge, J Am Med Inf Assoc, 20, 806, 10.1136/amiajnl-2013-001628 Henry, 2018, n2c2 shared task on adverse drug events and medication extraction in electronic health records, Journal of the American Medical Informatics Association 2020, 27, 3, 10.1093/jamia/ocz166 Johnson, 2019, MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports, Sci Data, 6, 317, 10.1038/s41597-019-0322-0 Johnson Nielsen, 2015 Moon, 2015, 65 Gal Aronow, 1999, Ad hoc classification of radiology reports, J Am Med Inf Assoc, 6, 393, 10.1136/jamia.1999.0060393 Nakamura, 2021, Automatic detection of actionable radiology reports using bidirectional encoder representations from transformers, BMC Med Inf Decis Making, 21, 262, 10.1186/s12911-021-01623-6 Hripcsak, 2002, Use of natural Language Processing to translate clinical information from a database of 889,921 chest radiographic reports, Radiology, 224, 157, 10.1148/radiol.2241011118 Warden, 2011, Leveraging terminologies for retrieval of radiology reports with critical imaging findings, AMIA Annu Symp Proc, 2011, 1481 Dreyer, 2005, Application of recently developed computer algorithm for automatic classification of unstructured radiology reports: validation study, Radiology, 234, 323, 10.1148/radiol.2341040049 Turchin, 2023, Comparison of BERT implementations for natural language processing of narrative medical documents, Inform Med Unlocked, 36, 10.1016/j.imu.2022.101139