Evaluation of an Automated Information Extraction Tool for Imaging Data Elements to Populate a Breast Cancer Screening Registry

Journal of Digital Imaging - Tập 28 - Trang 567-575 - 2015
Ronilda Lacson1,2, Kimberly Harris3, Phyllis Brawarsky3, Tor D. Tosteson4, Tracy Onega4, Anna N. A. Tosteson4,5, Abby Kaye3, Irina Gonzalez3, Robyn Birdwell1,2, Jennifer S. Haas3,2
1Department of Radiology, Brigham and Women’s Hospital, Boston, USA
2Harvard Medical School, Boston, USA;
3Department of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, USA
4Department of Community and Family Medicine, The Dartmouth Institute for Health Policy and Clinical Practice, Lebanon, USA
5Department of Medicine, The Dartmouth Institute for Health Policy and Clinical Practice, Lebanon, USA

Tóm tắt

Breast cancer screening is central to early breast cancer detection. Identifying and monitoring process measures for screening is a focus of the National Cancer Institute’s Population-based Research Optimizing Screening through Personalized Regimens (PROSPR) initiative, which requires participating centers to report structured data across the cancer screening continuum. We evaluate the accuracy of automated information extraction of imaging findings from radiology reports, which are available as unstructured text. We present prevalence estimates of imaging findings for breast imaging received by women who obtained care in a primary care network participating in PROSPR (n = 139,953 radiology reports) and compared automatically extracted data elements to a “gold standard” based on manual review for a validation sample of 941 randomly selected radiology reports, including mammograms, digital breast tomosynthesis, ultrasound, and magnetic resonance imaging (MRI). The prevalence of imaging findings vary by data element and modality (e.g., suspicious calcification noted in 2.6 % of screening mammograms, 12.1 % of diagnostic mammograms, and 9.4 % of tomosynthesis exams). In the validation sample, the accuracy of identifying imaging findings, including suspicious calcifications, masses, and architectural distortion (on mammogram and tomosynthesis); masses, cysts, non-mass enhancement, and enhancing foci (on MRI); and masses and cysts (on ultrasound), range from 0.8 to1.0 for recall, precision, and F-measure. Information extraction tools can be used for accurate documentation of imaging findings as structured data elements from text reports for a variety of breast imaging modalities. These data can be used to populate screening registries to help elucidate more effective breast cancer screening processes.

Tài liệu tham khảo