Humans and machines in biomedical knowledge curation: hypertrophic cardiomyopathy molecular mechanisms’ representation
Tóm tắt
Biomedical knowledge is dispersed in scientific literature and is growing constantly. Curation is the extraction of knowledge from unstructured data into a computable form and could be done manually or automatically. Hypertrophic cardiomyopathy (HCM) is the most common inherited cardiac disease, with genotype–phenotype associations still incompletely understood. We compared human- and machine-curated HCM molecular mechanisms’ models and examined the performance of different machine approaches for that task. We created six models representing HCM molecular mechanisms using different approaches and made them publicly available, analyzed them as networks, and tried to explain the models’ differences by the analysis of factors that affect the quality of machine-curated models (query constraints and reading systems’ performance). A result of this work is also the Interactive HCM map, the only publicly available knowledge resource dedicated to HCM. Sizes and topological parameters of the networks differed notably, and a low consensus was found in terms of centrality measures between networks. Consensus about the most important nodes was achieved only with respect to one element (calcium). Models with a reduced level of noise were generated and cooperatively working elements were detected. REACH and TRIPS reading systems showed much higher accuracy than Sparser, but at the cost of extraction performance. TRIPS proved to be the best single reading system for text segments about HCM, in terms of the compromise between accuracy and extraction performance. Different approaches in curation can produce models of the same disease with diverse characteristics, and they give rise to utterly different conclusions in subsequent analysis. The final purpose of the model should direct the choice of curation techniques. Manual curation represents the gold standard for information extraction in biomedical research and is most suitable when only high-quality elements for models are required. Automated curation provides more substance, but high level of noise is expected. Different curation strategies can reduce the level of human input needed. Biomedical knowledge would benefit overwhelmingly, especially as to its rapid growth, if computers were to be able to assist in analysis on a larger scale.
Tài liệu tham khảo
Winnenburg R, Wachter T, Plake C, Doms A, Schroeder M. Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies? Brief Bioinform. 2008;9(6):466–78. https://doi.org/10.1093/bib/bbn043.
National Library of Medicine: PubMed Overview. https://pubmed.ncbi.nlm.nih.gov/about/. Accessed 25 Apr 2021.
National Center for Biotechnology Information, U.S. National Library of Medicine: MeSH. https://www.ncbi.nlm.nih.gov/mesh/. Accessed 25 Apr 2021.
Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur Ö, Anwar N, et al. Pathway commons, a web resource for biological pathway data. Nucleic Acids Res. 2011;39(Database issue):D685–90. https://doi.org/10.1093/nar/gkq1039.
Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, et al. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 2014;42(Database issue):D1091–7. https://doi.org/10.1093/nar/gkt1068.
Gaulton A, Hersey A, Nowotka ML, Patricia Bento A, Chambers J, Mendez D, et al. The ChEMBL database in 2017. Nucleic Acids Res. 2017;45(D1):D945–54. https://doi.org/10.1093/nar/gkw1074.
Davis AP, Grondin CJ, Johnson RJ, Sciaky D, McMorran R, Wiegers J, et al. The comparative Toxicogenomics database: update 2019. Nucleic Acids Res. 2019;47(Database issue):D948–54. https://doi.org/10.1093/nar/gky868.
Huang HY, Lin YCD, Li J, Huang KY, Shrestha S, Hong HC, et al. MiRTarBase 2020: updates to the experimentally validated microRNA-target interaction database. Nucleic Acids Res. 2020;48(D1):D148–54. https://doi.org/10.1093/nar/gkz896.
Ammari M, Chatr Aryamontri A, Attrill H, Bairoch A, Berardini T, Blake J, et al. Biocuration: distilling data into knowledge. PLoS Biol. 2018;16(4):e2002846. https://doi.org/10.1371/journal.pbio.2002846.
Ostaszewski M, Gebel S, Kuperstein I, Mazein A, Zinovyev A, Dogrusoz U, et al. Community-driven roadmap for integrated disease maps. Brief Bioinform. 2019;20(2):659–70. https://doi.org/10.1093/bib/bby024.
Hoyt CT, Domingo-Fernández D, Aldisi R, Xu L, Kolpeja K, Spalek S, et al. Re-curation and rational enrichment of knowledge graphs in Biological Expression Language. Database. 2019;2019(1):baz068.
Tsueng G, Nanis SM, Fouquier J, Good BM, Su AI. Citizen science for mining the biomedical literature. Citiz Sci Theory Pract. 2016;1(2):14. https://doi.org/10.5334/cstp.56.
Gyori BM, Bachman JA, Subramanian K, Muhlich JL, Galescu L, Sorger PK. From word models to executable models of signaling networks using automated assembly. Mol Syst Biol. 2017;13(11):954. https://doi.org/10.15252/msb.20177651.
Valenzuela-Escárcega MA, Babur Ö, Hahn-Powell G, Bell D, Hicks T, Noriega-Atala E, et al. Large-scale automated machine reading discovers new cancer-driving mechanisms. Database. 2018;2018(2018):bay098.
Allen JF, Bahkshandeh O, De Beaumont W, Galescu L, Teng CM. Effective broad-coverage deep parsing introduction: broad, deep semantic parsing. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence; 2018. p. 4776–83.
Sparser. https://github.com/ddmcdonald/sparser. Accessed 25 Apr 2021.
Garg S, Galstyan A, Hermjakob U, Marcu D. Extracting biomolecular interactions using semantic parsing of biomedical text. Proc Thirtieth AAAI Conf Artif Intell. 2016;30(1):2718–26.
Hu ZZ, Narayanaswamy M, Ravikumar KE, Vijay-Shanker K, Wu CH. Literature mining and database annotation of protein phosphorylation using a rule-based system. Bioinformatics. 2005;21(11):2759–65. https://doi.org/10.1093/bioinformatics/bti390.
Sharp R, Pyarelal A, Gyori BM, Alcock K, Laparra E, Valenzuela-Escárcega MA, et al. Eidos, INDRA, & Delphi: from free text to executable causal models. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), vol. 42; 2019. p. 7.
INDRA Database. https://db.indra.bio/search. Accessed 25 Apr 2021.
Mazein A, Ostaszewski M, Kuperstein I, Watterson S, Le Novère N, Lefaudeux D, et al. Systems medicine disease maps: community-driven comprehensive representation of disease mechanisms. npj Syst Biol Appl. 2018;4(1):21.
Singh V, Kalliolias GD, Ostaszewski M, Veyssiere M, Pilalis E, Gawron P, et al. RA-map: building a state-of-the-art interactive knowledge base for rheumatoid arthritis. Database (Oxford). 2020;2020:baaa017.
Velicki L, Jakovljevic DG, Preveden A, Golubovic M, Bjelobrk M, Ilic A, et al. Genetic determinants of clinical phenotype in hypertrophic cardiomyopathy. BMC Cardiovasc Disord. 2020;20(1):516. https://doi.org/10.1186/s12872-020-01807-4.
Sakellaropoulos S, Svab S, Mohammed M, Dimitra L, Mitsis A. The role of mitral valve in hypertrophic obstructive cardiomyopathy: an updated review. Curr Probl Cardiol. 2021;46(3):100641. https://doi.org/10.1016/j.cpcardiol.2020.100641.
Blagova O, Alieva I, Kogan E, Zaytsev A, Sedov V, Chernyavskiy S, et al. Mixed hypertrophic and dilated phenotype of cardiomyopathy in a patient with homozygous in-frame deletion in the MyBPC3 gene treated as myocarditis for a long time. Front Pharmacol. 2020;11:579450. https://doi.org/10.3389/fphar.2020.579450.
Sabater-Molina M, Pérez-Sánchez I. Hernández del Rincón JP, Gimeno JR. genetics of hypertrophic cardiomyopathy: a review of current state. Clin Genet. 2018;93(1):3–14. https://doi.org/10.1111/cge.13027.
Geske JB, Ommen SR, Gersh BJ. Hypertrophic cardiomyopathy: clinical update. JACC Heart Fail. 2018;6(5):364–75. https://doi.org/10.1016/j.jchf.2018.02.010.
Deranek AE, Klass MM, Tardiff JC. Moving beyond simple answers to complex disorders in sarcomeric cardiomyopathies: the role of integrated systems. Pflug Arch Eur J Physiol. 2019;471(5):661–71. https://doi.org/10.1007/s00424-019-02269-0.
Smole T, Žunkovič B, Pičulin M, Kokalj E, Robnik-Šikonja M, Kukar M, et al. A machine learning-based risk stratification model for ventricular tachycardia and heart failure in hypertrophic cardiomyopathy. Comput Biol Med. 2021;135:104648. https://doi.org/10.1016/j.compbiomed.2021.104648.
de Antunes MO, Scudeler TL. Hypertrophic cardiomyopathy. Int J Cardiol Heart Vasc. 2020;27:100503.
Wolf CM. Hypertrophic cardiomyopathy: genetics and clinical perspectives. Cardiovasc Diagn Ther. 2019;9(S2):S388–415. https://doi.org/10.21037/cdt.2019.02.01.
Sedaghat-Hamedani F, Kayvanpour E, Tugrul OF, Lai A, Amr A, Haas J, et al. Clinical outcomes associated with sarcomere mutations in hypertrophic cardiomyopathy: a meta-analysis on 7675 individuals. Clin Res Cardiol. 2018;107(1):30–41. https://doi.org/10.1007/s00392-017-1155-5.
Cytoscape App Store: wk-shell-decomposition. http://apps.cytoscape.org/apps/wkshelldecomposition. Accessed 25 Apr 2021.
Tadaka S, Kinoshita K. NCMine: core-peripheral based functional module detection using near-clique mining. Bioinformatics. 2016;32(22):3454–60. https://doi.org/10.1093/bioinformatics/btw488.
Hoksza D, Gawron P, Ostaszewski M, Hasenauer J, Schneider R. Closing the gap between formats for storing layout information in systems biology. Brief Bioinform. 2020;21(4):1249–60. https://doi.org/10.1093/bib/bbz067.
Gawron P, Ostaszewski M, Satagopam V, Gebel S, Mazein A, Kuzma M, et al. MINERVA—a platform for visualization and curation of molecular interaction networks. npj Syst Biol Appl. 2016;2(1):16020.
Hoksza D, Gawron P, Ostaszewski M, Smula E, Schneider R. MINERVA API and plugins: opening molecular network analysis and visualization to the community. Bioinformatics. 2019;35(21):4496–8. https://doi.org/10.1093/bioinformatics/btz286.
Hoksza D, Gawron P, Ostaszewski M, Schneider R. MolArt: a molecular structure annotation and visualization tool. Bioinformatics. 2018;34(23):4127–8. https://doi.org/10.1093/bioinformatics/bty489.
The Atlas of Inflammation Resolution: Plugins. https://air.bio.informatik.uni-rostock.de/plugins. Accessed 25 Apr 2021.
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13(11):2498–504. https://doi.org/10.1101/gr.1239303.
Pillich RT, Chen J, Rynkov V, Welker D, Pratt D. NDEx: a community resource for sharing and publishing of biological networks. Methods Mol Biol. 2017;1558:271–301. https://doi.org/10.1007/978-1-4939-6783-4_13.
Pratt D, Chen J, Welker D, Rivas R, Pillich R, Rynkov V, et al. NDEx, the network data exchange. Cell Syst. 2015;1(4):302–5. https://doi.org/10.1016/j.cels.2015.10.001.
Pratt D, Chen J, Pillich R, Rynkov V, Gary A, Demchak B, et al. NDEx 2.0: a clearinghouse for research on cancer pathways. Cancer Res. 2017;77(21):e58–61. https://doi.org/10.1158/0008-5472.CAN-17-0606.
Scardoni G, Laudanna C. Centralities based analysis of complex networks. In: Zhang Y, editor. New Frontiers in graph theory. Rijeka: InTech; 2012. p. 323–48. https://doi.org/10.5772/35846.
Lovejoy WS, Loch CH. Minimal and maximal characteristic path lengths in connected sociomatrices. Soc Networks. 2003;25(4):333–47. https://doi.org/10.1016/j.socnet.2003.10.001.
Chen F, Chen Z, Wang X, Yuan Z. The average path length of scale free networks. Commun Nonlinear Sci. 2008;13(7):1405–10. https://doi.org/10.1016/j.cnsns.2006.12.003.
Kartun-Giles AP, Bianconi G. Beyond the clustering coefficient: a topological analysis of node neighbourhoods in complex networks. Chaos Solitons Fractals: X. 2019;1:100004. https://doi.org/10.1016/j.csfx.2019.100004.
Aftabuddin M, Kundu S. Hydrophobic, hydrophilic, and charged amino acid networks within protein. Biophys J. 2007;93(1):225–31. https://doi.org/10.1529/biophysj.106.098004.
Stokman FN. Networks: social. In: Baltes PB, Smelser NJ, editors. International encyclopedia of the Social & Behavioral Sciences. Oxford: Pergamon Press; 2001. p. 10509–14. https://doi.org/10.1016/B0-08-043076-7/01934-3.
Zaki N, Efimov D, Berengueres J. Protein complex detection using interaction reliability assessment and weighted clustering coefficient. BMC Bioinformatics. 2013;14(1):163. https://doi.org/10.1186/1471-2105-14-163.
Vlastaridis P, Kyriakidou P, Chaliotis A, Van de Peer Y, Oliver SG, Amoutzias GD. Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes. Gigascience. 2017;6(2):1–11. https://doi.org/10.1093/gigascience/giw015.
Berginski ME, Moret N, Liu C, Goldfarb D, Sorger PK, Gomez SM. The dark kinase knowledgebase: an online compendium of knowledge and experimental results of understudied kinases. Nucleic Acids Res. 2021;49(D1):D529–35. https://doi.org/10.1093/nar/gkaa853.
Ueda Y, Stern JA. A one health approach to hypertrophic cardiomyopathy. Yale J Biol Med. 2017;90(3):433–48.
Allen J, Us J, De Beaumont W, Galescu L, Teng CM. Complex event extraction using DRUM. In: Proceedings of BioNLP 15, vol. 15; 2015. p. 1–11.
Cohen PR. DARPA’s big mechanism program. Phys Biol. 2015;12(4):045008. https://doi.org/10.1088/1478-3975/12/4/045008.
Bose R, Vashishtha S, Allen J. Improving semantic parsing using statistical word sense disambiguation (student abstract). Proc AAAI Conf Artif Intell. 2020;34(10):13757–8.
Mizuno S, Iijima R, Ogishima S, Kikuchi M, Matsuoka Y, Ghosh S, et al. AlzPathway: a comprehensive map of signaling pathways of Alzheimer’s disease. BMC Syst Biol. 2012;6(1):52. https://doi.org/10.1186/1752-0509-6-52.
Kuperstein I, Bonnet E, Nguyen HA, Cohen D, Viara E, Grieco L, et al. Atlas of Cancer Signalling network: a systems biology resource for integrative analysis of cancer data with Google maps. Oncogenesis. 2015;4(7):e160. https://doi.org/10.1038/oncsis.2015.19.
Fujita KA, Ostaszewski M, Matsuoka Y, Ghosh S, Glaab E, Trefois C, et al. Integrating pathways of Parkinson’s disease in a molecular interaction map. Mol Neurobiol. 2014;49(1):88–102. https://doi.org/10.1007/s12035-013-8489-4.
Matsuoka Y, Matsumae H, Katoh M, Eisfeld AJ, Neumann G, Hase T, et al. A comprehensive map of the influenza a virus replication cycle. BMC Syst Biol. 2013;7(1):97. https://doi.org/10.1186/1752-0509-7-97.
Mazein A, Knowles RG, Adcock I, Chung KF, Wheelock CE, Maitland-van der Zee AH, et al. AsthmaMap: an expert-driven computational representation of disease mechanisms. Clin Exp Allergy. 2018;48(8):916–8. https://doi.org/10.1111/cea.13211.
Mazein A, Ivanova O, Balaur I, Ostaszewski M, Berzhitskaya V, Serebriyskaya T, et al. AsthmaMap: an interactive knowledge repository for mechanisms of asthma. J Allergy Clin Immunol. 2021;147(3):853–6. https://doi.org/10.1016/j.jaci.2020.11.032.
Serhan CN, Gupta SK, Perretti M, Godson C, Brennan E, Li Y, et al. The atlas of inflammation resolution (AIR). Mol Asp Med. 2020;74:100894. https://doi.org/10.1016/j.mam.2020.100894.
Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19(4):524–31. https://doi.org/10.1093/bioinformatics/btg015.
Le Novère N, Hucka M, Mi H, Moodie S, Schreiber F, Sorokin A, et al. The systems biology graphical notation. Nat Biotechnol. 2009;27(8):735–41. https://doi.org/10.1038/nbt.1558.
Le Novère N, Finney A, Hucka M, Bhalla US, Campagne F, Collado-Vides J, et al. Minimum information requested in the annotation of biochemical models (MIRIAM). Nat Biotechnol. 2005;23(12):1509–15. https://doi.org/10.1038/nbt1156.
Glavaški M, Velicki L. Shared molecular mechanisms of hypertrophic cardiomyopathy and its clinical presentations: automated molecular mechanisms extraction approach. Life. 2021;11(8):785. https://doi.org/10.3390/life11080785.
Assenov Y, Ramírez F, Schelhorn SE, Lengauer T, Albrecht M. Computing topological parameters of biological networks. Bioinformatics. 2008;24(2):282–4. https://doi.org/10.1093/bioinformatics/btm554.
NetworkAnalyzer Settings. https://med.bioinf.mpi-inf.mpg.de/netanalyzer/help/2.7/index.html. Accessed 8 Aug 2021.
Chin CH, Chen SH, Wu HH, Ho CW, Ko MT, Lin CY. cytoHubba: Identifying hub objects and sub-networks from complex interactome. BMC Syst Biol. 2014;8(4):S11.
Bioinformatics & Evolutionary Genomics, Webtools: Venn diagram. http://bioinformatics.psb.ugent.be/webtools/Venn/. Accessed 25 Apr 2021.