Automated detection of altered mental status in emergency department clinical notes: a deep learning approach

BMC Medical Informatics and Decision Making - Tập 19 - Trang 1-9 - 2019
Jihad S. Obeid1,2, Erin R. Weeda3, Andrew J. Matuskowitz4, Kevin Gagnon5, Tami Crawford1, Christine M. Carr4,1, Lewis J. Frey1,2
1Biomedical Informatics Center, Medical University of South Carolina, Charleston, USA
2Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA
3Department of Clinical Pharmacy and Outcome Sciences, Medical University of South Carolina, Charleston, USA
4Department of Emergency Medicine, Medical University of South Carolina, Charleston, USA
5Department of Computer Science and Engineering, University of South Carolina, Columbia, USA

Tóm tắt

Machine learning has been used extensively in clinical text classification tasks. Deep learning approaches using word embeddings have been recently gaining momentum in biomedical applications. In an effort to automate the identification of altered mental status (AMS) in emergency department provider notes for the purpose of decision support, we compare the performance of classic bag-of-words-based machine learning classifiers and novel deep learning approaches. We used a case-control study design to extract an adequate number of clinical notes with AMS and non-AMS based on ICD codes. The notes were parsed to extract the history of present illness, which was used as the clinical text for the classifiers. The notes were manually labeled by clinicians. As a baseline for comparison, we tested several traditional bag-of-words based classifiers. We then tested several deep learning models using a convolutional neural network architecture with three different types of word embeddings, a pre-trained word2vec model and two models without pre-training but with different word embedding dimensions. We evaluated the models on 1130 labeled notes from the emergency department. The deep learning models had the best overall performance with an area under the ROC curve of 98.5% and an accuracy of 94.5%. Pre-training word embeddings on the unlabeled corpus reduced training iterations and had performance that was statistically no different than the other deep learning models. This supervised deep learning approach performs exceedingly well for the detection of AMS symptoms in clinical text in our environment. Further work is needed for the generalizability of these findings, including evaluation of these models in other types of clinical notes and other environments. The results seem promising for the ultimate use of these types of classifiers in combination with other information derived from the electronic health records as input for clinical decision support.

Tài liệu tham khảo

Frey LJ, Lenert L, Lopez-Campos G. EHR big data deep phenotyping. Contribution of the IMIA genomic medicine working group. Yearb Med Inform. 2014;9:206–11. Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, et al. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc. 2014;21(2):221–30. Richesson RL, Sun J, Pathak J, Kho AN, Denny JC. Clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods. Artif Intell Med. 2016;71:57–61. Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008:128–44. Obeid JS, Beskow LM, Rape M, Gouripeddi R, Black RA, Cimino JJ, et al. A survey of practices for the use of electronic health records to support research recruitment. J Clin Transl Sci. 2017;1(4):246–52. Cowie MR, Blomster JI, Curtis LH, Duclaux S, Ford I, Fritz F, et al. Electronic health records to facilitate clinical research. Clin Res Cardiol. 2017;106(1):1–9. Kim DJ, Rockhill B, Colditz GA. Validation of the Harvard Cancer risk index: a prediction tool for individual cancer risk. J Clin Epidemiol. 2004;57(4):332–40. Schmiedeskamp M, Harpe S, Polk R, Oinonen M, Pakyz A. Use of international classification of diseases, ninth revision, clinical modification codes and medication use data to identify nosocomial Clostridium difficile infection. Infect Control Hosp Epidemiol. 2009;30(11):1070–6. Zhong VW, Obeid JS, Craig JB, Pfaff ER, Thomas J, Jaacks LM, et al. An efficient approach for surveillance of childhood diabetes by type derived from electronic health record data: the SEARCH for diabetes in youth study. J Am Med Inform Assoc. 2016;23(6):1060–7. Wilke RA, Xu H, Denny JC, Roden DM, Krauss RM, McCarty CA, et al. The emerging role of electronic medical records in pharmacogenomics. Clin Pharmacol Ther. 2011;89(3):379–86. Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet. 2012;13(6):395–405. Miotto R, Li L, Kidd BA, Dudley JT. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep. 2016;6:26094. Aujesky D, Obrosky DS, Stone RA, Auble TE, Perrier A, Cornuz J, et al. Derivation and validation of a prognostic model for pulmonary embolism. Am J Respir Crit Care Med. 2005;172(8):1041–6. Donzé J, Le Gal G, Fine MJ, Roy P-M, Sanchez O, Verschuren F, et al. Prospective validation of the pulmonary embolism severity index. A clinical prognostic model for pulmonary embolism. Thromb Haemost. 2008;100(5):943–8. Prandoni P, Lensing AWA, Prins MH, Ciammaichella M, Perlati M, Mumoli N, et al. Prevalence of Pulmonary Embolism among Patients Hospitalized for Syncope. N Engl J Med. 2016;375(16):1524–31. Costantino G, Ruwald MH, Quinn J, Camargo CA, Dalgaard F, Gislason G, et al. Prevalence of pulmonary embolism in patients with Syncope. JAMA Intern Med. 2018;178(3):356–62. Tu K, Mitiku T, Lee DS, Guo H, Tu JV. Validation of physician billing and hospitalization data to identify patients with ischemic heart disease using data from the electronic medical record administrative data linked database (EMRALD). Can J Cardiol. 2010;26(7):e225–8. Goldberg D, Lewis J, Halpern S, Weiner M, Lo RV. Validation of three coding algorithms to identify patients with end-stage liver disease in an administrative database. Pharmacoepidemiol Drug Saf. 2012;21(7):765–9. O’Malley KJ, Cook KF, Price MD, Wildes KR, Hurdle JF, Ashton CM. Measuring diagnoses: ICD code accuracy. Health Serv Res. 2005;40(5 Pt 2):1620–39. Wei W-Q, Leibson CL, Ransom JE, Kho AN, Caraballo PJ, Chai HS, et al. Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus. J Am Med Inform Assoc. 2012;19(2):219–24. Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010;17(3):229–36. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, et al. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17(5):507–13. Soysal E, Wang J, Jiang M, Wu Y, Pakhomov S, Liu H, et al. CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines. J Am Med Inform Assoc. 2017. López Pineda A, Ye Y, Visweswaran S, Cooper GF, Wagner MM, Tsui FR. Comparison of machine learning classifiers for influenza detection from emergency department free-text reports. J Biomed Inform. 2015;58:60–9. Afzal Z, Schuemie MJ, van Blijderveen JC, Sen EF, Sturkenboom MCJM, Kors JA. Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records. BMC Med Inform Decis Mak. 2013;13:30. Amrit C, Paauw T, Aly R, Lavric M. Identifying child abuse through text mining and machine learning. Expert Syst Appl. 2017;88:402–18. Kononenko I. Inductive and bayesian learning in medical diagnosis. Appl Artif Intell. 1993 Oct 1;7(4):317–37. Drucker H, Wu D, Vapnik VN. Support vector machines for spam categorization. IEEE Trans Neural Netw. 1999;10(5):1048–54. Tin Kam Ho. Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition [Internet]. Montreal, Que., Canada: IEEE Comput. Soc. Press; 1995 [cited 2018 Dec 4]. p. 278–82. Available from: http://ieeexplore.ieee.org/document/598994/. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44. Mujtaba G, Shuib L, Idris N, Hoo WL, Raj RG, Khowaja K, et al. Clinical text classification research trends: systematic literature review and open issues. Expert Syst Appl. 2019;116:494–520. Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. Digital Medicine. 2018;1(1):18. Kim Y. Convolutional Neural Networks for Sentence Classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) [Internet]. Doha, Qatar: Association for Computational Linguistics; 2014 [cited 2018 Nov 20]. p. 1746–51. Available from: http://aclweb.org/anthology/D14-1181 Mikolov T, Chen K, Corrado G, Dean J. Efficient Estimation of Word Representations in Vector Space. 2013 16 [cited 2018 Nov 20]; Available from: https://arxiv.org/abs/1301.3781v3. Banerjee I, Madhavan S, Goldman RE, Rubin DL. Intelligent word Embeddings of free-text radiology reports. AMIA Annu Symp Proc. 2017;2017:411–20. Turner CA, Jacobs AD, Marques CK, Oates JC, Kamen DL, Anderson PE, et al. Word2Vec inversion and traditional text classifiers for phenotyping lupus. BMC Med Inform Decis Mak. 2017;17(1):126. Epic [Internet]. [cited 2019 Jun 5]. Available from: https://www.epic.com/. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377–81. R Core Team. R: A Language and Environment for Statistical Computing. [Internet]. 2018. Available from: https://www.r-project.org/. Benoit K, Watanabe K, Wang H, Nulty P, Obeng A, Müller S, et al. quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software. 2018;3(30):774. Manning CD, Raghavan P, Schutze H. Introduction to information retrieval [internet]. Cambridge: Cambridge University Press; 2008 [cited 2018 Dec 6]. Available from: http://ebooks.cambridge.org/ref/id/CBO9780511809071 McCallum A, Nigam K. A comparison of event models for Naive Bayes text classification. In: IN AAAI-98 WORKSHOP ON LEARNING FOR TEXT CATEGORIZATION. AAAI Press; 1998. p. 41–48. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1–22. Breiman L. Classification and regression trees [internet]. New York, N.Y.: Chapman & Hall/CRC; 1984 [cited 2018 Dec 6]. Available from: http://lib.myilibrary.com?id=1043565. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. Weston J. Watkins C. Citeseer: Multi-class support vector machines; 1998. Joachims T. Text categorization with Support Vector Machines: Learning with many relevant features. In: Nédellec C, Rouveirol C, editors. Machine Learning: ECML-98 [Internet]. Berlin, Heidelberg: Springer Berlin Heidelberg; 1998 [cited 2019 Feb 9]. p. 137–42. Available from: http://link.springer.com/10.1007/BFb0026683. Chollet F. Keras [Internet]. 2018 [cited 2018 Nov 20]. Available from: https://keras.io/. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems [Internet]. 2018 [cited 2018 Nov 20]. Available from: https://www.tensorflow.org/ van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9:2579–605. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980. 2014; Kuhn M. The caret Package [Internet]. [cited 2018 Dec 6]. Available from: http://topepo.github.io/caret/index.html. Huang Y, Wang W, Wang L, Tan T. Multi-task deep neural network for multi-label learning. In: 2013 IEEE International Conference on Image Processing [Internet]. Melbourne, Australia: IEEE; 2013 [cited 2018 Dec 10]. p. 2897–900. Available from: http://ieeexplore.ieee.org/document/6738596/. Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2001;34(5):301–10. Mnih V, Heess N, Graves A. Recurrent models of visual attention. In 2014. p. 2204–12. Shin B, Chokshi FH, Lee T, Choi JD. Classification of radiology reports using neural attention models. arXiv preprint arXiv:170806828. 2017.