Analysing the requirements for an Open Research Knowledge Graph: use cases, quality requirements, and construction strategies

Springer Science and Business Media LLC - Tập 23 - Trang 33-55 - 2021
Arthur Brack1,2, Anett Hoppe1, Markus Stocker1, Sören Auer1,2, Ralph Ewerth1,2
1TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
2L3S Research Center, Leibniz University Hannover, Hannover, Germany

Tóm tắt

Current science communication has a number of drawbacks and bottlenecks which have been subject of discussion lately: Among others, the rising number of published articles makes it nearly impossible to get a full overview of the state of the art in a certain field, or reproducibility is hampered by fixed-length, document-based publications which normally cannot cover all details of a research work. Recently, several initiatives have proposed knowledge graphs (KG) for organising scientific information as a solution to many of the current issues. The focus of these proposals is, however, usually restricted to very specific use cases. In this paper, we aim to transcend this limited perspective and present a comprehensive analysis of requirements for an Open Research Knowledge Graph (ORKG) by (a) collecting and reviewing daily core tasks of a scientist, (b) establishing their consequential requirements for a KG-based system, (c) identifying overlaps and specificities, and their coverage in current solutions. As a result, we map necessary and desirable requirements for successful KG-based science communication, derive implications, and outline possible solutions.

Tài liệu tham khảo

Ammar, W., Groeneveld, D., Bhagavatula, C., Beltagy, I., Crawford, M., Downey, D., Dunkelberger, J., Elgohary, A., Feldman, S., Ha, V., Kinney, R., Kohlmeier, S., Lo, K., Murray, T., Ooi, H., Peters, M.E., Power, J., Skjonsberg, S., Wang, L.L., Wilhelm, C., Yuan, Z., van Zuylen, M., Etzioni, O.: Construction of the literature graph in semantic scholar. In: Bangalore, S., Chu-Carroll, J., Li, Y. (eds.) Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1–6, 2018, vol. 3 (Industry Papers), pp. 84–91. Association for Computational Linguistics (2018). https://doi.org/10.18653/v1/n18-3011 Aryani, A., Wang, J.: Research graph: Building a distributed graph of scholarly works using research data switchboard. Open Repos. Conf. (2017). https://doi.org/10.4225/03/58c696655af8a Auer, S., Mann, S.: Towards an open research knowledge graph. Ser. Libr. 76(1–4), 35–41 (2019). https://doi.org/10.1080/0361526X.2019.1540272 Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: Scienceie—xtracting keyphrases and relations from scientific publications. In: Bethard, S., Carpuat, M., Apidianaki, M., Mohammad, S.M., Cer, D.M., Jurgens, D. (eds.) Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017, Vancouver, Canada, 2017, pp. 546–555. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/S17-2091 Badie, K., Asadi, N., Mahmoudi, M.T.: Zone identification based on features with high semantic richness and combining results of separate classifiers. J. Inf. Telecommun. 2(4), 411–427 (2018). https://doi.org/10.1080/24751839.2018.1460083 Balog, K.: Entity-Oriented Search. Springer, Berlin (2018). https://doi.org/10.1007/978-3-319-93935-3 Bechhofer, S., Buchan, I.E., Roure, D.D., Missier, P., Ainsworth, J.D., Bhagat, J., Couch, P.A., Cruickshank, D., Delderfield, M., Dunlop, I., Gamble, M., Michaelides, D.T., Owen, S., Newman, D.R., Sufi, S., Goble, C.A.: Why linked data is not enough for scientists. Future Gener. Comput. Syst. 29(2), 599–611 (2013). https://doi.org/10.1016/j.future.2011.08.004 Beel, J., Gipp, B., Langer, S., Breitinger, C.: Research-paper recommender systems: a literature survey. Int. J. Digit. Libr. 17(4), 305–338 (2016). https://doi.org/10.1007/s00799-015-0156-0 Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 2019, pp. 3613–3618. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/D19-1371 Bizer, C.: Quality-Driven Information Filtering—In the Context of Web-Based Information Systems. VDM Verlag, Saarbrücken (2007) Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucl. Acids Res. 32, 267–270 (2004). https://doi.org/10.1093/nar/gkh061 Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Wang, J.T. (ed.) Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, 2008, pp. 1247–1250. ACM (2008). https://doi.org/10.1145/1376616.1376746 Booch, G., Rumbaugh, J., Jacobson, I.: Unified Modeling Language User Guide, The (2nd Edition) (Addison-Wesley Object Technology Series). Addison-Wesley Professional, Boston (2005) Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. 66(11), 2215–2222 (2015). https://doi.org/10.1002/asi.23329 Brack, A., D’Souza, J., Hoppe, A., Auer, S., Ewerth, R.: Domain-independent extraction of scientific concepts from research articles. In: Jose, J.M., Yilmaz, E., Magalhães, J., Castells, P., Ferro, N., Silva, M.J., Martins, F. (eds.) Advances in Information Retrieval—42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, 2020, Proceedings, Part I, Lecture Notes in Computer Science, vol. 12035, pp. 251–266. Springer (2020). https://doi.org/10.1007/978-3-030-45439-5_17 Brack, A., Hoppe, A., Stocker, M., Auer, S., Ewerth, R.: Requirements analysis for an open research knowledge graph. In: Hall, M.M., Mercun, T., Risse, T., Duchateau, F. (eds.) Digital Libraries for Open Knowledge—24th International Conference on Theory and Practice of Digital Libraries, TPDL 2020, Lyon, France, 2020, Proceedings, Lecture Notes in Computer Science, vol. 12246, pp. 3–18. Springer (2020). https://doi.org/10.1007/978-3-030-54956-5_1 Brack, A., Müller, D.U., Hoppe, A., Ewerth, R.: Coreference resolution in research papers from multiple domains. In: Hiemstra, D., Moens, M., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) Advances in Information Retrieval—43rd European Conference on IR Research, ECIR 2021, Virtual Event, 2021, Proceedings, Part I, Lecture Notes in Computer Science, vol. 12656, pp. 79–97. Springer (2021). https://doi.org/10.1007/978-3-030-72113-8_6 Braun, R., Benedict, M., Wendler, H., Esswein, W.: Proposal for requirements driven design science research. In: Donnellan, B., Helfert, M., Kenneally, J., VanderMeer, D.E., Rothenberger, M.A., Winter, R. (eds.) New Horizons in Design Science: Broadening the Research Agenda—10th International Conference, DESRIST 2015, Dublin, Ireland, 2015, Proceedings, Lecture Notes in Computer Science, vol. 9073, pp. 135–151. Springer (2015). https://doi.org/10.1007/978-3-319-18714-3_9 Brodaric, B., Reitsma, F., Qiang, Y.: Skiing with DOLCE: toward an e-science knowledge infrastructure. In: Eschenbach, C., Grüninger, M. (eds.) Formal Ontology in Information Systems, Proceedings of the Fifth International Conference, FOIS 2008, Saarbrücken, Germany, 2008, Frontiers in Artificial Intelligence and Applications, vol. 183, pp. 208–219. IOS Press (2008). https://doi.org/10.3233/978-1-58603-923-3-208 Burton, A., Aryani, A., Koers, H., Manghi, P., Bruzzo, S.L., Stocker, M., Diepenbroek, M., Schindler, U., Fenner, M.: The scholix framework for interoperability in data-literature information exchange. D-Lib Mag. 23(1/2), 1–20 (2017). https://doi.org/10.1045/january2017-burton Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Jr., E.R.H., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Fox, M., Poole, D. (eds.) Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, 2010. AAAI Press (2010). http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/1879 CB Insights: The data flywheel: how enlightened self-interest drives data network effects. https://www.cbinsights.com/research/team-blog/data-network-effects/ (2020) Cohan, A., Ammar, W., van Zuylen, M., Cady, F.: Structural scaffolds for citation intent classification in scientific publications. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2019, vol. 1 (Long and Short Papers), pp. 3586–3596. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1361 Cohan, A., Beltagy, I., King, D., Dalvi, B., Weld, D.S.: Pretrained language models for sequential sentence classification. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 2019, pp. 3691–3697. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/D19-1383 Cohen, K.B., Lanfranchi, A., Choi, M.J., Baumgartner, W.A., Panteleyeva, N., Verspoor, K., Palmer, M., Hunter, L.E.: Coreference annotation and resolution in the Colorado richly annotated full text (CRAFT) corpus of biomedical journal articles. BMC Bioinform. 18(1), 1–14 (2017). https://doi.org/10.1186/s12859-017-1775-9 Consortium, T.G.O., Consortium: The gene ontology resource: 20 years and still going strong. Nucl. Acids Res. 47, D330–D338 (2019). https://doi.org/10.1093/nar/gky1055 Constantin, A., Peroni, S., Pettifer, S., Shotton, D.M., Vitali, F.: The document components ontology (DoCo). Semant. Web 7(2), 167–181 (2016). https://doi.org/10.3233/SW-150177 Dayrell, C., Jr., A.C., Lima, G., Jr., D.M., Copestake, A.A., Feltrim, V.D., Tagnin, S.E.O., Aluísio, S.M.: Rhetorical move detection in english abstracts: multi-label sentence classifiers and their annotated corpora. In: Calzolari, N., Choukri, K., Declerck, T., Dogan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey, 2012, pp. 1604–1609. European Language Resources Association (ELRA) (2012). http://www.lrec-conf.org/proceedings/lrec2012/summaries/734.html Degbelo, A.: A snapshot of ontology evaluation criteria and strategies. In: Hoekstra, R., Faron-Zucker, C., Pellegrini, T., de Boer, V. (eds.) Proceedings of the 13th International Conference on Semantic Systems, SEMANTICS 2017, Amsterdam, The Netherlands, 2017, pp. 1–8. ACM (2017). https://doi.org/10.1145/3132218.3132219 Degtyarenko, K., de Matos, P., Ennis, M., Hastings, J., Zbinden, M., McNaught, A., Alcántara, R., Darsow, M., Guedj, M., Ashburner, M.: Chebi: a database and ontology for chemical entities of biological interest. Nucl. Acids Res. 36, 344–350 (2008). https://doi.org/10.1093/nar/gkm791 Dernoncourt, F., Lee, J.Y.: Pubmed 200k RCT: a dataset for sequential sentence classification in medical abstracts. In: Kondrak, G., Watanabe, T. (eds.) Proceedings of the Eighth International Joint Conference on Natural Language Processing, IJCNLP 2017, Taipei, Taiwan, 2017, Volume 2: Short Papers, pp. 308–313. Asian Federation of Natural Language Processing (2017). https://www.aclweb.org/anthology/I17-2052/ Dessì, D., Osborne, F., Recupero, D.R., Buscaldi, D., Motta, E., Sack, H.: AI-KG: an automatically generated knowledge graph of artificial intelligence. In: Pan, J.Z., Tamma, V.A.M., d’Amato, C., Janowicz, K., Fu, B., Polleres, A., Seneviratne, O., Kagal, L. (eds.) The Semantic Web—ISWC 2020—19th International Semantic Web Conference, Athens, Greece, 2020, Proceedings, Part II, Lecture Notes in Computer Science, vol. 12507, pp. 127–143. Springer (2020). https://doi.org/10.1007/978-3-030-62466-8_9 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2019, vol. 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423 Doerr, M., Kritsotaki, A., Rousakis, Y., Hiebel, G., Theodoridou, M.: Definition of the CRMsci: an extension of CIDOC-CRM to support scientific observation. Tech. rep., FORTH, Version 1.2.8. http://www.cidoc-crm.org/crmsci/ModelVersion/version-1.2.8 (2020) Dogan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014). https://doi.org/10.1016/j.jbi.2013.12.006 Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Macskassy, S.A., Perlich, C., Leskovec, J., Wang, W., Ghani, R. (eds.) The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA-2014, pp. 601–610. ACM (2014). https://doi.org/10.1145/2623330.2623623 D’Souza, J., Hoppe, A., Brack, A., Jaradeh, M.Y., Auer, S., Ewerth, R.: The STEM-ECR dataset: grounding scientific entity references in STEM scholarly content to authoritative encyclopedic and lexicographic sources. In: Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of The 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, 2020, pp. 2192–2203. European Language Resources Association (2020). https://www.aclweb.org/anthology/2020.lrec-1.268/ Färber, M.: The microsoft academic knowledge graph: A linked data source with 8 billion triples of scholarly data. In: Ghidini, C., Hartig, O., Maleshkova, M., Svátek, V., Cruz, I.F., Hogan, A., Song, J., Lefrançois, M., Gandon, F. (eds.) The Semantic Web—ISWC 2019—18th International Semantic Web Conference, Auckland, New Zealand,, 2019, Proceedings, Part II, Lecture Notes in Computer Science, vol. 11779, pp. 113–129. Springer (2019). https://doi.org/10.1007/978-3-030-30796-7_8 Färber, M., Bartscherer, F., Menne, C., Rettinger, A.: Linked data quality of DBpedia, Freebase, Opencyc, Wikidata, and YAGO. Semant. Web 9(1), 77–129 (2018). https://doi.org/10.3233/SW-170275 Fathalla, S., Vahdati, S., Auer, S., Lange, C.: Towards a knowledge graph representing research findings by semantifying survey articles. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L.S., Karydis, I. (eds.) Research and Advanced Technology for Digital Libraries—21st International Conference on Theory and Practice of Digital Libraries, TPDL 2017, Thessaloniki, Greece, 2017, Proceedings, Lecture Notes in Computer Science, vol. 10450, pp. 315–327. Springer (2017). https://doi.org/10.1007/978-3-319-67008-9_25 Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. Language, Speech, and Communication. MIT Press, Cambridge (1998) Fink, A.: Conducting Research Literature Reviews: From the Internet to Paper. SAGE Publications, Thousand Oaks (2014) Fisas, B., Saggion, H., Ronzano, F.: On the discoursive structure of computer graphics research papers. In: Meyers, A., Rehbein, I., Zinsmeister, H. (eds.) Proceedings of The 9th Linguistic Annotation Workshop, LAW@NAACL-HLT 2015, 2015, Denver, Colorado, USA, pp. 42–51. The Association for Computer Linguistics (2015). https://doi.org/10.3115/v1/w15-1605 Friedrich, A., Adel, H., Tomazic, F., Hingerl, J., Benteau, R., Marusczyk, A., Lange, L.: The sofc-exp corpus and neural approaches to information extraction in the materials science domain. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 2020, pp. 1255–1268. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.116 Gábor, K., Buscaldi, D., Schumann, A., Qasemi Zadeh, B., Zargayouna, H., Charnois, T.: Semeval-2018 task 7: Semantic relation extraction and classification in scientific papers. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, New Orleans, Louisiana, USA, 2018, pp. 679–688. Association for Computational Linguistics (2018). https://doi.org/10.18653/v1/s18-1111 Galárraga, L., Razniewski, S., Amarilli, A., Suchanek, F.M.: Predicting completeness in knowledge bases. In: de Rijke, M., Shokouhi, M., Tomkins, A., Zhang, M. (eds.) Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM 2017, Cambridge, United Kingdom, 2017, pp. 375–383. ACM (2017). https://doi.org/10.1145/3018661.3018739 Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.M.: AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: Schwabe, D., Almeida, V.A.F., Glaser, H., Baeza-Yates, R., Moon, S.B. (eds.) 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro, Brazil, 2013, pp. 413–422. International World Wide Web Conferences Steering Committee. ACM (2013). https://doi.org/10.1145/2488388.2488425 Gonçalves, S., Cortez, P., Moro, S.: A deep learning classifier for sentence classification in biomedical and computer science abstracts. Neural Comput. Appl. 32(11), 6793–6807 (2020). https://doi.org/10.1007/s00521-019-04334-2 Groza, T., Handschuh, S., Möller, K., Decker, S.: SALT—semantically annotated latex for scientific publications. In: Franconi, E., Kifer, M., May, W. (eds.) The Semantic Web: Research and Applications, 4th European Semantic Web Conference, ESWC 2007, Innsbruck, Austria, 2007, Proceedings, Lecture Notes in Computer Science, vol. 4519, pp. 518–532. Springer (2007). https://doi.org/10.1007/978-3-540-72667-8_37 Hars, A.: Structure of Scientific Knowledge, pp. 83–185. Springer, Berlin (2003). https://doi.org/10.1007/978-3-540-24737-1_3 Hevner, A.R., March, S.T., Park, J., Ram, S.: Design science in information systems research. MIS Q. 28(1), 75–105 (2004) Hoppe, A., Hagen, J., Holzmann, H., Kniesel, G., Ewerth, R.: An analytics tool for exploring scientific software and related publications. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J.C. (eds.) Digital Libraries for Open Knowledge, 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Porto, Portugal, 2018, Proceedings, Lecture Notes in Computer Science, vol. 11057, pp. 299–303. Springer (2018). https://doi.org/10.1007/978-3-030-00066-0_27 Horvath, I.: Comparison of three methodological approaches of design research. In: S.N. (ed.) Proceedings of the 16th International Conference on Engineering Design, ICED’07, pp. 1–11. Ecole Central Paris (2007). Null; Conference date: 28-08-2007 through 30-08-2007 Hou, Y., Jochim, C., Gleize, M., Bonin, F., Ganguly, D.: Identification of tasks, datasets, evaluation metrics, and numeric scores for scientific leaderboards construction. In: Korhonen, A., Traum, D.R., Màrquez, L. (eds.) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, 2019, vol. 1: Long Papers, pp. 5203–5213. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/p19-1513 Jain, S., van Zuylen, M., Hajishirzi, H., Beltagy, I.: Scirex: A challenge dataset for document-level information extraction. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 2020, pp. 7506–7516. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.670 Jaradeh, M.Y., Oelen, A., Prinz, M., Stocker, M., Auer, S.: Open research knowledge graph: a system walkthrough. In: Doucet, A., Isaac, A., Golub, K., Aalberg, T., Jatowt, A. (eds.) Digital Libraries for Open Knowledge—23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Oslo, Norway, 2019, Proceedings, Lecture Notes in Computer Science, vol. 11799, pp. 348–351. Springer (2019). https://doi.org/10.1007/978-3-030-30760-8_31 Jia, R., Wong, C., Poon, H.: Document-level n-ary relation extraction with multiscale representation learning. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2019, vol. 1 (Long and Short Papers), pp. 3693–3704. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1370 Kannan, A.V., Fradkin, D., Akrotirianakis, I., Kulahcioglu, T., Canedo, A., Roy, A., Yu, S., Malawade, A.V., Faruque, M.A.A.: Multimodal knowledge graph for deep learning papers and code. In: d’Aquin, M., Dietze, S., Hauff, C., Curry, E., Cudré-Mauroux, P. (eds.) CIKM ’20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, 2020, pp. 3417–3420. ACM (2020). https://doi.org/10.1145/3340531.3417439 Kardas, M., Czapla, P., Stenetorp, P., Ruder, S., Riedel, S., Taylor, R., Stojnic, R.: Axcell: Automatic extraction of results from machine learning papers. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, 2020, pp. 8580–8594. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.emnlp-main.692 Kim, S., Martínez, D., Cavedon, L., Yencken, L.: Automatic classification of sentences to support evidence based medicine. BMC Bioinform. 12(2), S5 (2011). https://doi.org/10.1186/1471-2105-12-S2-S5 Kitchenham, B.A., Charters, S.: Guidelines for performing systematic literature reviews in software engineering. Tech. Rep. EBSE 2007-001, Keele University and Durham University Joint Report. https://www.elsevier.com/__data/promis_misc/525444systematicreviewsguide.pdf (2007) Klampanos, I.A., Davvetas, A., Koukourikos, A., Karkaletsis, V.: ANNETT-O: an ontology for describing artificial neural network evaluation, topology and training. Int. J. Metadata Semant. Ontol. 13(3), 179–190 (2019). https://doi.org/10.1504/IJMSO.2019.099833 Kolitsas, N., Ganea, O., Hofmann, T.: End-to-end neural entity linking. In: Korhonen, A., Titov, I. (eds.) Proceedings of the 22nd Conference on Computational Natural Language Learning, CoNLL 2018, Brussels, Belgium, 2018, pp. 519–529. Association for Computational Linguistics (2018). https://doi.org/10.18653/v1/k18-1050 Kringelum, J., Kjærulff, S.K., Brunak, S., Lund, O., Oprea, T.I., Taboureau, O.: Chemprot-3.0: a global chemical biology diseases mapping. Database J. Biol. Databases Curation (2016). https://doi.org/10.1093/database/bav123 Lange, C.: Ontologies and languages for representing mathematical knowledge on the semantic web. Semant. Web 4(2), 119–158 (2013). https://doi.org/10.3233/SW-2012-0059 Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: Dbpedia—a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6(2), 167–195 (2015). https://doi.org/10.3233/SW-140134 Li, J., Sun, Y., Johnson, R.J., Sciaky, D., Wei, C., Leaman, R., Davis, A.P., Mattingly, C.J., Wiegers, T.C., Lu, Z.: Biocreative V CDR task corpus: a resource for chemical disease relation extraction. Database J. Biol. Databases Curation 2016, (2016). https://doi.org/10.1093/database/baw068 Liakata, M., Saha, S., Dobnik, S., Batchelor, C.R., Rebholz-Schuhmann, D.: Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics 28(7), 991–1000 (2012). https://doi.org/10.1093/bioinformatics/bts071 Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.R.: Corpora for the conceptualisation and zoning of scientific papers. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, 2010, Valletta, Malta. European Language Resources Association (2010). http://www.lrec-conf.org/proceedings/lrec2010/summaries/644.html Lo, K., Wang, L.L., Neumann, M., Kinney, R., Weld, D.S.: S2ORC: the semantic scholar open research corpus. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 2020, pp. 4969–4983. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.447 Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 3219–3232. Association for Computational Linguistics (2018). https://doi.org/10.18653/v1/d18-1360 Lubani, M., Noah, S.A.M., Mahmud, R.: Ontology population: approaches and design aspects. J. Inf. Sci. (2019). https://doi.org/10.1177/0165551518801819 Manghi, P., Bardi, A., Atzori, C., Baglioni, M., Manola, N., Schirrwagen, J., Principe, P.: The OpenAIRE research graph data model. Zenodo (2019). https://doi.org/10.5281/zenodo.2643199 Mesbah, S., Fragkeskos, K., Lofi, C., Bozzon, A., Houben, G.: Semantic annotation of data processing pipelines in scientific publications. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) The Semantic Web—14th International Conference, ESWC 2017, Portorož, Slovenia, 2017, Proceedings, Part I, Lecture Notes in Computer Science, vol. 10249, pp. 321–336 (2017). https://doi.org/10.1007/978-3-319-58068-5_20 Nasar, Z., Jaffry, S.W., Malik, M.K.: Information extraction from scientific articles: a survey. Scientometrics 117(3), 1931–1990 (2018). https://doi.org/10.1007/s11192-018-2921-5 Nguyen, V.B., Svátek, V., Rabby, G., Corcho, Ó.: Ontologies supporting research-related information foraging using knowledge graphs: literature survey and holistic model mapping. In: Keet, C.M., Dumontier, M. (eds.) Knowledge Engineering and Knowledge Management—22nd International Conference, EKAW 2020, Bolzano, Italy, 2020, Proceedings, Lecture Notes in Computer Science, vol. 12387, pp. 88–103. Springer (2020). https://doi.org/10.1007/978-3-030-61244-3_6 Nickel, M., Murphy, K., Tresp, V., Gabrilovich, E.: A review of relational machine learning for knowledge graphs. Proc. IEEE 104(1), 11–33 (2016). https://doi.org/10.1109/JPROC.2015.2483592 Oelen, A., Jaradeh, M.Y., Stocker, M., Auer, S.: Generate FAIR literature surveys with scholarly knowledge graphs. In: Huang, R., Wu, D., Marchionini, G., He, D., Cunningham, S.J., Hansen, P. (eds.) JCDL ’20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, Virtual Event, China, 2020, pp. 97–106. ACM (2020). https://doi.org/10.1145/3383583.3398520 Okoli, C.: A guide to conducting a standalone systematic literature review. Commun. Assoc. Inf. Syst. 37, 43 (2015) Papers with code. https://paperswithcode.com/. Accessed 04 Oct 2021 Park, S., Caragea, C.: Scientific keyphrase identification and classification by pre-trained language models intermediate task transfer learning. In: Scott, D., Bel, N., Zong, C. (eds.) Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), 2020, pp. 5409–5419. International Committee on Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.coling-main.472 Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. In: Demner-Fushman, D., Cohen, K.B., Ananiadou, S., Tsujii, J. (eds.) Proceedings of the 18th BioNLP Workshop and Shared Task, BioNLP@ACL 2019, Florence, Italy, 2019, pp. 58–65. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/w19-5006 Peroni, S., Shotton, D.M.: Fabio and cito: ontologies for describing bibliographic resources and citations. J. Web Semant. 17, 33–43 (2012). https://doi.org/10.1016/j.websem.2012.08.001 Pertsas, V., Constantopoulos, P.: Scholarly ontology: modelling scholarly practices. Int. J. Digit. Libr. 18(3), 173–190 (2017). https://doi.org/10.1007/s00799-016-0169-3 Petasis, G., Karkaletsis, V., Paliouras, G., Krithara, A., Zavitsanos, E.: Ontology population and enrichment: state of the art. In: Paliouras, G., Spyropoulos, C.D., Tsatsaronis, G. (eds.) Knowledge-Driven Multimedia Information Extraction and Ontology Evolution—Bridging the Semantic Gap, Lecture Notes in Computer Science, vol. 6050, pp. 134–166. Springer (2011). https://doi.org/10.1007/978-3-642-20795-2_6 Pineau, J., Vincent-Lamarre, P., Sinha, K., Larivière, V., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Larochelle, H.: Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program). CoRR abs/2003.12206 (2020). arXiv:2003.12206 Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Commun. ACM 45(4), 211–218 (2002). https://doi.org/10.1145/505248.506010 Pujara, J., Singh, S.: Mining knowledge graphs from text. In: Chang, Y., Zhai, C., Liu, Y., Maarek, Y. (eds.) Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, Marina Del Rey, CA, USA, 2018, pp. 789–790. ACM (2018). https://doi.org/10.1145/3159652.3162011 Qasemi Zadeh, B., Handschuh, B.S.: The ACL RD-TEC: a dataset for benchmarking terminology extraction and classification in computational linguistics. In: Proceedings of the 4th International Workshop on Computational Terminology (Computerm), pp. 52–63. Association for Computational Linguistics and Dublin City University, Dublin, Ireland (2014). 10.3115/v1/W14-4807. https://www.aclweb.org/anthology/W14-4807 Qasemi Zadeh, B., Schumann, A.: The ACL RD-TEC 2.0: a language resource for evaluating term extraction and entity recognition methods. In: Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, 2016. European Language Resources Association (ELRA) (2016). http://www.lrec-conf.org/proceedings/lrec2016/summaries/681.html Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100, 000+ questions for machine comprehension of text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, 2016, pp. 2383–2392. The Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/d16-1264 Richardson, S., Wilson, M., Nishikawa, J., Hayward, R.: The well-built clinical question: a key to evidence-based decisions. ACP J. Club 123(3), A12–13 (1995) Ruiz-Iniesta, A., Corcho, Ó.: A review of ontologies for describing scholarly and scientific documents. In: Castro, A.G., Lange, C., Lord, P.W., Stevens, R. (eds.) Proceedings of the 4th Workshop on Semantic Publishing Co-located with the 11th Extended Semantic Web Conference (ESWC 2014), Anissaras, Greece, 2014, CEUR Workshop Proceedings, vol. 1155. CEUR-WS.org (2014). http://ceur-ws.org/Vol-1155/paper-07.pdf Safder, I., Hassan, S., Visvizi, A., Noraset, T., Nawaz, R., Tuarob, S.: Deep learning-based extraction of algorithmic metadata in full-text scholarly documents. Inf. Process. Manag. 57(6), 102269 (2020). https://doi.org/10.1016/j.ipm.2020.102269 Salatino, A.A., Thanapalasingam, T., Mannocci, A., Birukou, A., Osborne, F., Motta, E.: The computer science ontology: a comprehensive automatically-generated taxonomy of research areas. Data Intell. 2(3), 379–416 (2020). https://doi.org/10.1162/dint_a_00055 Say, A., Fathalla, S., Vahdati, S., Lehmann, J., Auer, S.: Semantic representation of physics research data. In: Aveiro, D., Dietz, J.L.G., Filipe, J. (eds.) Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2020, vol. 2: KEOD, Budapest, Hungary, 2020, pp. 64–75. SCITEPRESS (2020). https://doi.org/10.5220/0010111000640075 Singh, M., Barua, B., Palod, P., Garg, M., Satapathy, S., Bushi, S., Ayush, K., Rohith, K.S., Gamidi, T., Goyal, P., Mukherjee, A.: OCR++: a robust framework for information extraction from scholarly articles. In: Calzolari, N., Matsumoto, Y., Prasad, R. (eds.) COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, 2016, Osaka, Japan, pp. 3390–3400. ACL (2016). https://www.aclweb.org/anthology/C16-1320/ Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L.J., Eilbeck, K., Ireland, A., Mungall, C.J., Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S.A., Scheuermann, R.H., Shah, N., Whetzel, P.L., Lewis, S., Consortium, T.O.: The obo foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol. 25(11), 1251–1255 (2007). https://doi.org/10.1038/nbt1346 Soldatova, L.N., King, R.D.: An ontology of scientific experiments. J. R. Soc. Interface 3(11), 795–803 (2006). https://doi.org/10.1098/rsif.2006.0134 Stead, C., Smith, S., Busch, P.A., Vatanasakdakul, S.: Emerald 110k: a multidisciplinary dataset for abstract sentence classification. In: Mistica, M., Piccardi, M., MacKinlay, A. (eds.) Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association, ALTA 2019, Sydney, Australia, 2019, pp. 120–125. Australasian Language Technology Association (2019). https://aclweb.org/anthology/papers/U/U19/U19-1016/ Stocker, M., Prinz, M., Rostami, F., Kempf, T.: Towards research infrastructures that curate scientific information: a use case in life sciences. In: Auer, S., Vidal, M. (eds.) Data Integration in the Life Sciences—13th International Conference, DILS 2018, Hannover, Germany, 2018, Proceedings, Lecture Notes in Computer Science, vol. 11371, pp. 61–74. Springer (2018). https://doi.org/10.1007/978-3-030-06016-9_6 Suchanek, F.M., Gross-Amblard, D., Abiteboul, S.: Watermarking for ontologies. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N.F., Blomqvist, E. (eds.) The Semantic Web—ISWC 2011—10th International Semantic Web Conference, Bonn, Germany, 2011, Proceedings, Part I, Lecture Notes in Computer Science, vol. 7031, pp. 697–713. Springer (2011). https://doi.org/10.1007/978-3-642-25073-6_44 Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Williamson, C.L., Zurko, M.E., Patel-Schneider, P.F., Shenoy, P.J. (eds.) Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, 2007, pp. 697–706. ACM (2007). https://doi.org/10.1145/1242572.1242667 Talburt, J.R.: 2—principles of information quality. In: Talburt, J.R. (ed.) Entity Resolution and Information Quality, pp. 39–62. Morgan Kaufmann, Boston (2011). https://doi.org/10.1016/B978-0-12-381972-7.00002-6. http://www.sciencedirect.com/science/article/pii/B9780123819727000026 Teufel, S., Siddharthan, A., Batchelor, C.R.: Towards domain-independent argumentative zoning: Evidence from chemistry and computational linguistics. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Singapore, A Meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1493–1502. ACL (2009). https://www.aclweb.org/anthology/D09-1155/ Vahdati, S., Fathalla, S., Auer, S., Lange, C., Vidal, M.: Semantic representation of scientific publications. In: Doucet, A., Isaac, A., Golub, K., Aalberg, T., Jatowt, A. (eds.) Digital Libraries for Open Knowledge—23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Oslo, Norway, 2019, Proceedings, Lecture Notes in Computer Science, vol. 11799, pp. 375–379. Springer (2019). https://doi.org/10.1007/978-3-030-30760-8_37 Vandenbussche, P., Atemezing, G., Poveda-Villalón, M., Vatant, B.: Linked open vocabularies (LOV): a gateway to reusable semantic vocabularies on the web. Semant. Web 8(3), 437–452 (2017). https://doi.org/10.3233/SW-160213 Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014). https://doi.org/10.1145/2629489 Waard, A., Tel, G.: The ABCDE format enabling semantic conference proceedings. In: Völkel, M., Schaffert, S. (eds.) SemWiki2006, First Workshop on Semantic Wikis—From Wiki to Semantics, Proceedings, Co-located with the ESWC2006, Budva, Montenegro, 2006, CEUR Workshop Proceedings, vol. 206. CEUR-WS.org (2006). http://ceur-ws.org/Vol-206/paper8.pdf Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12(4), 5–33 (1996) Weikum, G., Dong, L., Razniewski, S., Suchanek, F.M.: Machine knowledge: creation and curation of comprehensive knowledge bases. CoRR abs/2009.11564 (2020). arXiv:2009.11564 Xiong, C., Power, R., Callan, J.: Explicit semantic ranking for academic search via knowledge graph embedding. In: Barrett, R., Cummings, R., Agichtein, E., Gabrilovich, E. (eds.) Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, 2017, pp. 1271–1279. ACM (2017). https://doi.org/10.1145/3038912.3052558 Yaman, B., Pasin, M., Freudenberg, M.: Interlinking scigraph and dbpedia datasets using link discovery and named entity recognition techniques. In: Eskevich, M., de Melo, G., Fäth, C., McCrae, J.P., Buitelaar, P., Chiarcos, C., Klimek, B., Dojchinovski, M. (eds.) 2nd Conference on Language, Data and Knowledge, LDK 2019, Leipzig, Germany, OASICS, vol. 70, pp. 15:1–15:8. Schloss Dagstuhl–Leibniz–Zentrum für Informatik (2019). https://doi.org/10.4230/OASIcs.LDK.2019.15 Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016). https://doi.org/10.3233/SW-150175 Zhang, Y., Wang, M., Saberi, M., Chang, E.: From big scholarly data to solution-oriented knowledge repository. Front. Big Data 2, 38 (2019). https://doi.org/10.3389/fdata.2019.00038