Automated prioritization of sick newborns for whole genome sequencing using clinical natural language processing and machine learning

Springer Science and Business Media LLC - Tập 15 - Trang 1-9 - 2023
Bennet Peterson1, Edgar Javier Hernandez2, Charlotte Hobbs3, Sabrina Malone Jenkins4, Barry Moore2, Edwin Rosales3, Samuel Zoucha4, Erica Sanford3,5, Matthew N. Bainbridge3, Erwin Frise6, Albert Oriol7, Luca Brunelli4, Stephen F. Kingsmore3, Mark Yandell2
1Department of Biomedical Informatics, University of Utah, Salt Lake City, USA
2Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, USA
3Rady Children’s Institute for Genomic Medicine, San Diego, USA
4Division of Neonatology, Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, USA
5Department of Pediatrics, Cedars-Sinai Medical Center, Los Angeles, USA
6Fabric Genomics Inc., Oakland, USA
7Rady Children’s Hospital, San Diego, USA

Tóm tắt

Rapidly and efficiently identifying critically ill infants for whole genome sequencing (WGS) is a costly and challenging task currently performed by scarce, highly trained experts and is a major bottleneck for application of WGS in the NICU. There is a dire need for automated means to prioritize patients for WGS. Institutional databases of electronic health records (EHRs) are logical starting points for identifying patients with undiagnosed Mendelian diseases. We have developed automated means to prioritize patients for rapid and whole genome sequencing (rWGS and WGS) directly from clinical notes. Our approach combines a clinical natural language processing (CNLP) workflow with a machine learning-based prioritization tool named Mendelian Phenotype Search Engine (MPSE). MPSE accurately and robustly identified NICU patients selected for WGS by clinical experts from Rady Children’s Hospital in San Diego (AUC 0.86) and the University of Utah (AUC 0.85). In addition to effectively identifying patients for WGS, MPSE scores also strongly prioritize diagnostic cases over non-diagnostic cases, with projected diagnostic yields exceeding 50% throughout the first and second quartiles of score-ranked patients. Our results indicate that an automated pipeline for selecting acutely ill infants in neonatal intensive care units (NICU) for WGS can meet or exceed diagnostic yields obtained through current selection procedures, which require time-consuming manual review of clinical notes and histories by specialized personnel.

Tài liệu tham khảo

Church G. Compelling Reasons for repairing human germlines. N Engl J Med. 2017;377(20):1909–11. https://doi.org/10.1056/NEJMp1710370.

Farnaes L, Hildreth A, Sweeney NM, et al. Rapid whole-genome sequencing decreases infant morbidity and cost of hospitalization. NPJ Genomic Med. 2018;3:10. https://doi.org/10.1038/s41525-018-0049-4.

Petrikin JE, Cakici JA, Clark MM, et al. The NSIGHT1-randomized controlled trial: rapid whole-genome sequencing for accelerated etiologic diagnosis in critically ill infants. NPJ Genomic Med. 2018;3:6. https://doi.org/10.1038/s41525-018-0045-8.

Cipriani V, Pontikos N, Arno G, et al. An improved phenotype-driven tool for rare mendelian variant prioritization: benchmarking exomiser on real patient whole-exome data. Genes. 2020;11(4). https://doi.org/10.3390/genes11040460.

Birgmeier J, Haeussler M, Deisseroth CA, et al. AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature. Sci Transl Med. 2020;12(544):eaau9113. https://doi.org/10.1126/scitranslmed.aau9113.

Groza T, Köhler S, Moldenhauer D, et al. The human phenotype ontology: semantic unification of common and rare disease. Am J Hum Genet. 2015;97(1):111–24. https://doi.org/10.1016/j.ajhg.2015.05.020.

Clark MM, Hildreth A, Batalov S, et al. Diagnosis of genetic diseases in seriously ill children by rapid whole-genome sequencing and automated phenotyping and interpretation. Sci Transl Med. 2019;11(489):eaat6177. https://doi.org/10.1126/scitranslmed.aat6177.

James KN, Clark MM, Camp B, et al. Partially automated whole-genome sequencing reanalysis of previously undiagnosed pediatric patients can efficiently yield new diagnoses. NPJ Genomic Med. 2020;5(1):1–8. https://doi.org/10.1038/s41525-020-00140-1.

Peterson B, Hernandez J, Hobbs C, et al. Mendelian Phenotype Search Engine 2023. https://github.com/Yandell-Lab/MPSE

Dimmock DP, Clark MM, Gaughran M, et al. An RCT of rapid genomic sequencing among seriously ill infants results in high clinical utility, changes in management, and low perceived harm. Am J Hum Genet. 2020;107(5):942–52. https://doi.org/10.1016/j.ajhg.2020.10.003.

Sweeney NM, Nahas SA, Chowdhury S, et al. Rapid whole genome sequencing impacts care and resource utilization in infants with congenital heart disease. NPJ Genomic Med. 2021;6(1):29. https://doi.org/10.1038/s41525-021-00192-x.

Nicholas TJ, Al-Sweel N, Farrell A, et al. Comprehensive variant calling from whole-genome sequencing identifies a complex inversion that disrupts ZFPM2 in familial congenital diaphragmatic hernia. Mol Genet Genomic Med. 2022;10(4):e1888. https://doi.org/10.1002/mgg3.1888.

Clinithink. Clinithink: AI Solutions Company, Clinical Data Solutions for Life Science & Healthcare. Accessed March 5, 2021. https://www.clinithink.com.

Hastie T, Friedman J, Tibshirani R. The Elements of Statistical Learning. 1st ed. New York: Springer; 2001. https://link.springer.com/book/10.1007/978-0-387-21606-5. Accessed 20 Apr 2022

Deisseroth CA, Birgmeier J, Bodle EE, et al. ClinPhen extracts and prioritizes patient phenotypes directly from medical records to expedite genetic disease diagnosis. Genet Med. 2019;21(7):1585–93. https://doi.org/10.1038/s41436-018-0381-1.

Sanford EF, Clark MM, Farnaes L, et al. Rapid whole genome sequencing has clinical utility in children in the PICU. Pediatr Crit Care Med J Soc Crit Care Med World Fed Pediatr Intensive Crit Care Soc. 2019;20(11):1007–20. https://doi.org/10.1097/PCC.0000000000002056.

Bamshad MJ, Nickerson DA, Chong JX. Mendelian gene discovery: fast and furious with no end in sight. Am J Hum Genet. 2019;105(3):448–55. https://doi.org/10.1016/j.ajhg.2019.07.011.

Liu P, Meng L, Normand EA, et al. Reanalysis of clinical exome sequencing data. N Engl J Med. 2019;380(25):2478–80. https://doi.org/10.1056/NEJMc1812033.