Evaluation of an Automated Information Extraction Tool for Imaging Data Elements to Populate a Breast Cancer Screening Registry
Tóm tắt
Breast cancer screening is central to early breast cancer detection. Identifying and monitoring process measures for screening is a focus of the National Cancer Institute’s Population-based Research Optimizing Screening through Personalized Regimens (PROSPR) initiative, which requires participating centers to report structured data across the cancer screening continuum. We evaluate the accuracy of automated information extraction of imaging findings from radiology reports, which are available as unstructured text. We present prevalence estimates of imaging findings for breast imaging received by women who obtained care in a primary care network participating in PROSPR (n = 139,953 radiology reports) and compared automatically extracted data elements to a “gold standard” based on manual review for a validation sample of 941 randomly selected radiology reports, including mammograms, digital breast tomosynthesis, ultrasound, and magnetic resonance imaging (MRI). The prevalence of imaging findings vary by data element and modality (e.g., suspicious calcification noted in 2.6 % of screening mammograms, 12.1 % of diagnostic mammograms, and 9.4 % of tomosynthesis exams). In the validation sample, the accuracy of identifying imaging findings, including suspicious calcifications, masses, and architectural distortion (on mammogram and tomosynthesis); masses, cysts, non-mass enhancement, and enhancing foci (on MRI); and masses and cysts (on ultrasound), range from 0.8 to1.0 for recall, precision, and F-measure. Information extraction tools can be used for accurate documentation of imaging findings as structured data elements from text reports for a variety of breast imaging modalities. These data can be used to populate screening registries to help elucidate more effective breast cancer screening processes.
Tài liệu tham khảo
Pace LE, He Y, Keating NL: Trends in mammography screening rates after publication of the 2009 US Preventive Services Task Force recommendations. Cancer 119(14):2518–2523, 2013
Smith-Bindman R, Miglioretti DL, Lurie N, et al: Does utilization of screening mammography explain racial and ethnic differences in breast cancer? Ann Intern Med 144(8):541–553, 2006
Smigal C, Jemal A, Ward E, et al: Trends in breast cancer by race and ethnicity: update 2006. CA Cancer J Clin 56(3):168–183, 2006
Esserman L, Shieh Y, Thompson I: Rethinking screening for breast cancer and prostate cancer. JAMA 302(15):1685–1692, 2009
Sorlie T, Perou CM, Tibshirani R, et al: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A 98(19):10869–10874, 2001
Yang WT, Dryden M, Broglio K, et al: Mammographic features of triple receptor-negative primary breast cancers in young premenopausal women. Breast Cancer Res Treat 111(3):405–410, 2008
Atlas SJ, Ashburner JM, Chang Y, et al: Population-based breast cancer screening in a primary care network. Am J Manag Care 18(12):821–829, 2012
Lester WT, Ashburner JM, Grant RW, et al: Mammography FastTrack: an intervention to facilitate reminders for breast cancer screening across a heterogeneous multi-clinic primary care network. J Am Med Inform Assoc 16(2):187–195, 2009
Buckley JM, Coopey SB, Sharko J, et al: The feasibility of using natural language processing to extract clinical information from breast pathology reports. J Pathol Inform 3:23, 2012
Xu H, Fu Z, Shah A, et al: Extracting and integrating data from entire electronic health records for detecting colorectal cancer cases. AMIA Annu Symp Proc 2011:1564–1572, 2011
Harkema H, Chapman WW, Saul M, et al: Developing a natural language processing application for measuring the quality of colonoscopy procedures. J Am Med Inform Assoc 18(Suppl 1):i150–i156, 2011
Mowery D, Wiebe J, Visweswaran S, et al: Building an automated SOAP classifier for emergency department reports. J Biomed Inform 45(1):71–81, 2012
Currie AM, Fricke T, Gawne A et al: Automated extraction of free-text from pathology reports. AMIA Annu Symp Proc. 899, 2006
Sippo DA, Warden GI, Andriole KP, et al: Automated extraction of BI-RADS final assessment categories from radiology reports with natural language processing. J Digit Imaging 26(5):989–994, 2013
Percha B, Nassif H, Lipson J, et al: Automatic classification of mammography reports by BI-RADS breast tissue composition class. J Am Med Inform Assoc 19(5):913–916, 2012
Onega T, Smith M, Miglioretti DL, et al: Radiologist agreement for mammographic recall by case difficulty and finding type. J Am Coll Radiol 9(11):788–794, 2012
D’Orsi CJ, Sickles EA, Mendelson EB, Morris EA: ACR BI-RADS Atlas, Breast Imaging Reporting and Data System (BI-RADS). American College of Radiology, 5th ed, 2013
Siegal E, Angelakis E, Morris P, Pinkus E: Breast molecular imaging: a retrospective review of one institutions experience with this modality and analysis of its potential role in breast imaging decision making. Breast J 18(2):111–117, 2012
Feig SA: Role and evaluation of mammography and other imaging methods for breast cancer detection, diagnosis, and staging. Semin Nucl Med 29(1):3–15, 1999
Anders CK, Hsu DS, Broadwater G, et al: Young age at diagnosis correlates with worse prognosis and defines a subset of breast cancers with shared patterns of gene expression. J Clin Oncol 26(20):3324–3330, 2008
Birdwell RL, Ikeda DM, O’Shaughnessy KF, Sickles EA: Mammographic characteristics of 115 missed cancers later detected with screening mammography and the potential utility of computer-aided detection. Radiology 219(1):192–202, 2001
Goergen SK, Evans J, Cohen GP, MacMillan JH: Characteristics of breast carcinomas missed by screening radiologists. Radiology 204(1):131–135, 1997
Bullier B, MacGrogan G, Bonnefoi H, et al: Imaging features of sporadic breast cancer in women under 40 years old: 97 cases. Eur Radiol 23(12):3237–3245, 2013
Mendez A, Cabanillas F, Echenique M, et al: Mammographic features and correlation with biopsy findings using 11-gauge stereotactic vacuum-assisted breast biopsy (SVABB). Ann Oncol 15(3):450–454, 2004
Tamaki K, Ishida T, Miyashita M, et al: Correlation between mammographic findings and corresponding histopathology: potential predictors for biological characteristics of breast diseases. Cancer Sci 102(12):2179–2185, 2011
Muller-Schimpfle M, Wersebe A, Xydeas T, et al: Microcalcifications of the breast: how does radiologic classification correlate with histology? Acta Radiol 46(8):774–781, 2005
Ballard-Barbash R, Taplin SH, Yankaskas BC, et al: Breast Cancer Surveillance Consortium: a national mammography screening and outcomes database. AJR Am J Roentgenol 169(4):1001–1008, 1997
de Coronado S, Haber MW, Sioutos N, et al: NCI Thesaurus: using science-based terminology to integrate cancer research results. Stud Health Technol Inform 107(Pt 1):33–37, 2004
Langlotz CP: RadLex: a new method for indexing online educational materials. Radiographics 26(6):1595–1597, 2006
National Library of Medicine. Unified Medical Language System (UMLS) Glossary. http://www.nlm.nih.gov/research/umls/new_users/glossary.html. 8-28-2014. Last accessed 11-20-2014
Liu H, Wu ST, Li D, et al: Towards a semantic lexicon for clinical natural language processing. AMIA Annu Symp Proc 2012:568–576, 2012
National Cancer Institute Thesaurus. http://ncit.nci.nih.gov. 7-26-2010. Last accessed 11-20-2014
Information from Searching Content with an Ontology-Utilizing Toolkit. sourceforge.net/projects/iscout. 8-8-2012. Last accessed 11-20-2014
Lacson R, Andriole KP, Prevedello LM, Khorasani R: Information from Searching Content with an Ontology-Utilizing Toolkit (iSCOUT). J Digit Imaging, 2012
Chapman WW, Bridewell W, Hanbury P, et al: A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34(5):301–310, 2001
Sickles EA: Auditing your breast imaging practice: an evidence-based approach. Semin Roentgenol 42(4):211–217, 2007
Landis JR, Koch GG: The measurement of observer agreement for categorical data. Biometrics 33(1):159–174, 1977
Hersh W: Evaluation of biomedical text-mining systems: lessons learned from information retrieval. Brief Bioinform 6(4):344–356, 2005
Berg WA, Sechtin AG, Marques H, Zhang Z: Cystic breast masses and the ACRIN 6666 experience. Radiol Clin N Am 48(5):931–987, 2010
Hayes Jr, H, Vandergrift J, Diner WC: Mammography and breast implants. Plast Reconstr Surg 82(1):1–8, 1988
Gumucio CA, Pin P, Young VL, et al: The effect of breast implants on the radiographic detection of microcalcification and soft-tissue masses. Plast Reconstr Surg 84(5):772–778, 1989