Visualizing Clinical Data Retrieval and Curation in Multimodal Healthcare AI Research: A Technical Note on RIL-workflow
Journal of Imaging Informatics in Medicine - Trang 1-9 - 2024
Tóm tắt
Curating and integrating data from sources are bottlenecks to procuring robust training datasets for artificial intelligence (AI) models in healthcare. While numerous applications can process discrete types of clinical data, it is still time-consuming to integrate heterogenous data types. Therefore, there exists a need for more efficient retrieval and storage of curated patient data from dissimilar sources, such as biobanks, health records, and sensors. We describe a customizable, modular data retrieval application (RIL-workflow), which integrates clinical notes, images, and prescription data, and show its feasibility applied to research at our institution. It uses the workflow automation platform Camunda (Camunda Services GmbH, Berlin, Germany) to collect internal data from Fast Healthcare Interoperability Resources (FHIR) and Digital Imaging and Communications in Medicine (DICOM) sources. Using the web-based graphical user interface (GUI), the workflow runs tasks to completion according to visual representation, retrieving and storing results for patients meeting study inclusion criteria while segregating errors for human review. We showcase RIL-workflow with its library of ready-to-use modules, enabling researchers to specify human input or automation at fixed steps. We validated our workflow by demonstrating its capability to aggregate, curate, and handle errors related to data from multiple sources to generate a multimodal database for clinical AI research. Further, we solicited user feedback to highlight the pros and cons associated with RIL-workflow. The source code is available at github.com/magnooj/RIL-workflow.
Tài liệu tham khảo
Acosta, JN, Falcone, GJ, Rajpurkar, P. et al. Multimodal biomedical AI. Nat Med 28, 1773–1784 (2022). https://doi.org/10.1038/s41591-022-01981-2
Azam, K. S. F., Ryabchykov, O., & Bocklitz, T. A Review on Data Fusion of Multidimensional Medical and Biomedical Data. Molecules (Basel, Switzerland), 27(21), 7448 (2022). https://doi.org/10.3390/molecules27217448
Kline, A., Wang, H., Li, Y. et al. Multimodal machine learning in precision health: A scoping review. Npj Digit. Med. 5, 171 (2022). https://doi.org/10.1038/s41746-022-00712-8
Roth, C.J., Harten, H.H., Dewey, M. et al. How Image Exchange Breaks Down: the Image Library Perspective. J Digit Imaging 35, 785–795 (2022). https://doi.org/10.1007/s10278-022-00684-x
Amal, S., Safarnejad, L., Omiye, J. A., Ghanzouri, I., Cabot, J. H., & Ross, E. G. Use of Multi-Modal Data and Machine Learning to Improve Cardiovascular Disease Care. Frontiers in cardiovascular medicine, 9, 840262 (2022). https://doi.org/10.3389/fcvm.2022.840262
Deardorff A. Why do biomedical researchers learn to program? An exploratory investigation. Journal of the Medical Library Association : JMLA, 108(1), 29–35 (2020). https://doi.org/10.5195/jmla.2020.819
Kathiravelu P, Sharma P, Sharma A, et al. A DICOM Framework for Machine Learning and Processing Pipelines Against Real-time Radiology Images. J Digit Imaging. 2021;34(4):1005-1013. https://doi.org/10.1007/s10278-021-00491-w.
Burns JL, Hasting D, Gichoya JW, McKibben B 3rd, Shea L, Frank M. Just in Time Radiology Decision Support Using Real-time Data Feeds. J Digit Imaging. 2020;33(1):137-142. https://doi.org/10.1007/s10278-019-00268-2
Rubin DL, Willrett D, O’Connor MJ, Hage C, Kurtz C, Moreira DA. Automated tracking of quantitative assessments of tumor burden in clinical trials. Transl Oncol. 2014;7(1):23–35. Published 2014 Feb 1. https://doi.org/10.1593/tlo.13796
C Shah S Kohlmeyer KJ Hunter SE Jones PA Chen translational clinical assessment workflow for the validation of external artificial intelligence models. Proceedings Volume 11601, Medical Imaging, 2021 Imaging Informatics for Healthcare Research, and Applications 2021 116010F https://doi.org/10.1117/12.2581771
Ganjizadeh A. RIL-workflow. GitHub. http://github.com/magnooj/RIL-workflow. Updated Dec 1, 2023. Accessed Dec 1, 2023.
Moniruzzaman, A.B.M., Hossain, S.A. NoSQL Database: New Era of Databases for Big data Analytics -Classification, Characteristics and Comparison. International Journal of Database Theory and Application, 6(4) (2013). https://arxiv.org/ftp/arxiv/papers/1307/1307.0191.pdf
MongoDB, the Healthcare Database. (2023). MongoDB. https://www.mongodb.com/industries/healthcare.
Chauhan, D., Bansal, K. Using the Advantages of NOSQL: A Case Study on MongoDB. International Journal on Recent and Innovation Trends in Computing and Communication, 5(2) (2017). https://www.researchgate.net/publication/349110376.
Kong, H. J. (2019). Managing Unstructured Big Data in Healthcare System. Healthcare informatics research, 25(1), 1–2. https://doi.org/10.4258/hir.2019.25.1.1
Patil M. M., Hanni A., Tejeshwar, C. H., et al. A qualitative analysis of the performance of MongoDB vs MySQL database based on insertion and retriewal operations using a web/android application to explore load balancing — Sharding in MongoDB and its advantages. 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC) (2017). 325–330. https://ieeexplore.ieee.org/abstract/document/8058365
Conte, R., Tonacci, A., Sansone, F., Grande, A., Pala, A.P. (2019). Health360: An Open, Modular Platform for Multimodal Data Collection and AAL Monitoring. In: Leone, A., Caroppo, A., Rescio, G., Diraco, G., Siciliano, P. (eds) Ambient Assisted Living. ForItAAL 2018. Lecture Notes in Electrical Engineering, vol 544. Springer, Cham. https://doi.org/10.1007/978-3-030-05921-7_33
Carnevale, L., Celesti, A., Fazio, M., et al. How to enable clinical workflows to integrate big healthcare data. 2017 IEEE Symposium on Computers and Communications (ISCC) (2017). 857–862. https://ieeexplore.ieee.org/abstract/document/8024634
Camunda 7 Docs: Introduction. (2023). Camunda. https://docs.camunda.org/manual/7.19/introduction/.
Bansal, P., Ouda, A. Study on Integration of FastAPI and Machine Learning for Continuous Authentication of Behavioral Biometrics. 2022 International Symposium on Networks, Computers and Communications (ISNCC) (2022). 1–6. https://ieeexplore.ieee.org/abstract/document/9851790.
Camunda 8 Platform Docs: BPMN in Modeler. (2023). Camunda. https://docs.camunda.io/docs/components/modeler/bpmn/.
Camunda 7 Docs: Service Task. (2023). Camunda. https://docs.camunda.org/manual/7.19/reference/bpmn20/tasks/service-task/.
Madsen, M., Lhotak, O., Tip, F. A Semantics for the Essence of React. 34th European Conference on Object-Oriented Programming (ECOOP 2020) (2020). 12:1–27. https://par.nsf.gov/servlets/purl/10157540.
Campbell S, Greenwood M, Prior S, et al. Purposive sampling: complex or simple? Research case examples. Journal of Research in Nursing. 2020;25(8):652-661. https://doi.org/10.1177/1744987120927206
Greenhalgh T, Maylor H, Shaw S, et al. The NASSS-CAT Tools for Understanding, Guiding, Monitoring, and Researching Technology Implementation Projects in Health and Social Care: Protocol for an Evaluation Study in Real-World Settings. JMIR Res Protoc. 2020;9(5):e16861. Published 2020 May 13. https://doi.org/10.2196/16861
Chapman A, Hadfield M, Chapman C. Qualitative research in healthcare: An introduction to grounded theory using thematic analysis. Journal of the Royal College of Physicians of Edinburgh. 2015;45(3):201-205. https://doi.org/10.4997/jrcpe.2015.305
Ziegler, E., Urban, T., Brown, D., Petts, J., Pieper, S. D., Lewis, R., Hafey, C., & Harris, G. J. (2020). Open Health Imaging Foundation Viewer: An Extensible Open-Source Framework for Building Web-Based Imaging Applications to Support Cancer Research. JCO clinical cancer informatics, 4, 336–345. https://doi.org/10.1200/CCI.19.00131
Yushkevich, P. A., Piven, J., Hazlett, H. C., Smith, R. G., Ho, S., Gee, J. C., & Gerig, G. (2006). User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. NeuroImage, 31(3), 1116–1128. https://doi.org/10.1016/j.neuroimage.2006.01.015
Microsoft. How to configure Azure Multi-Factor Authentication server with Active Directory LDAP. Microsoft Learn. https://learn.microsoft.com/en-us/entra/identity/authentication/howto-mfaserver-dir-ldap. Published Oct 23, 2023. Accessed Dec 2, 2023.
Li X, Morgan PS, Ashburner J, Smith J, Rorden C. The first step for neuroimaging data analysis: DICOM to NIfTI conversion. J Neurosci Methods. 2016;264:47-56. https://doi.org/10.1016/j.jneumeth.2016.03.001
Chambon PJ, Wu C, Steinkamp JM, Adleberg J, Cook TS, Langlotz CP. Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods, JAMIA 2023;30(2):318–328. https://doi.org/10.1093/jamia/ocac219