Comprehensive structured knowledge base system construction with natural language presentation

Shirin Akther Khanam1, Fei Liu1, Yi-Ping Phoebe Chen1
1Department of Computer Science and Information Technology, La Trobe, Melbourne, Australia

Tóm tắt

Constructing an ontology-based machine-readable knowledge base system from different sources with minimum human intervention, also known as ontology-based machine-readable knowledge base construction (OMRKBC), has been a long-term outstanding problem. One of the issues is how to build a large-scale OMRKBC process with appropriate structural information. To address this issue, we propose Natural Language Independent Knowledge Representation (NLIKR), a method which regards each word as a concept which should be defined by its relations with other concepts. Using NLIKR, we propose a framework for the OMRKBC process to automatically develop a comprehensive ontology-based machine-readable knowledge base system (OMRKBS) using well-built structural information. Firstly, as part of this framework, we propose formulas to discover concepts and their relations in the OMRKBS. Secondly, the challenges in obtaining rich structured information are resolved through the development of algorithms and rules. Finally, rich structured information is built in the OMRKBS. OMRKBC allows the efficient search of words and supports word queries with a specific attribute. We conduct experiments and analyze the results of relational information extraction, with the results showing that OMRKBS had an accuracy of 84% which was higher than the other knowledge base systems, namely ConceptNet, DBpedia and WordNet.

Tài liệu tham khảo

Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C (2015) DBpedia—a large-scale, multilingual knowledge base extracted from Wikipedia. Sem Web J 6(2):167–195 Benferhat S, Dubois D, Prade H (1997) Some syntactic approaches to the handling of inconsistent knowledge bases: a comparative study part 1: the flat case. Studia Logica 58–1:17–45 Hasan KS, Ng V (2014) Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, pp 1262–1273 Najmi E, Hashmi K, Malik Z, Rezgui A, Khan HU (2014) Conceptonto: an upper ontology based on conceptnet. In: 11th ACS/IEEE international conference on computer systems and applications (AICCSA), Doha, pp 366–372 Zghal HB, Moreno A (2014) system for information retrieval in a medical digital library based on modular ontologies and query reformulation. Multimedia Tools Appl 72–3:2393–2412 Gorskis H, Aleksejeva L, Polaka I (2016) Database analysis for ontology learning. Procedia Comput Sci 102:113–120 Nakhla Z, Nouira K (2017) Automatic approach to enrich databases using ontology: application in medical domain. Procedia Comput Sci 12:387–396 Copestake A (1990) An approach to building the hierarchical element of a lexical knowledge base from a machine readable dictionary. In: Proceedings of the first international workshop on inheritance in natural language processing, Tilburg, The Netherlands, pp 19–29 Ji H, Grishman R (2011) Knowledge base population: successful approaches and challenges. In: Proceedings of the 49th annual meeting of the association for computational linguistics, Human Language Technologies, pp 1148–1158 Navigli R, Ponzetto SP (2012) Babelnet the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif Intell 193:217–250 Speer R, Chin J, Havasi C (2017) Conceptnet 5.5: an open multilingual graph of general knowledge. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 4444–4451 Boas HC (2017) Computational Resources: FrameNet and Constructicon. In: Dancygier B ed. Cambridge handbooks in language and linguistics. Cambridge University Press, pp 549–573. https://doi.org/10.1017/9781316339732.035 Fellbaum C (2012) The encyclopedia of applied linguistics. Wordnet. American Cancer Society, Dordrecht Wilson MD (1988) Mrc psycholinguistic database: machine usable dictionary (version 2.00). Behav Res Methods Instrum Comput 20–1:6–11 Sanchez D, Moreno A (2004) Recent advances in artificial intelligence research and development. Creating ontologies from web document. IOS Press, New York Riloff E (1993) Automatically constructing a dictionary for information extraction tasks. In: Proceedings of the 11th national conference on artificial intelligence. AAAI Press, Washington, D.C, pp 811–816 Wu S, Hsiao L, Cheng X, Hancock B, Rekatsinas T, Levis P, R C (2018) Fonduer: knowledge base construction from richly formatted data. In: Proceedings of the 2018 international conference on management of data (SIGMOD ’18), pp 1301–1316 Sa CD, Ratner A et al (2017) Incremental knowledge base construction using deepdive. VLDB J 26:81–105 Glauber R, Claro DB (2018) A systematic mapping study on open information extraction. Expert Syst Appl 112:372–387. https://doi.org/10.1016/j.eswa.2018.06.046 Noy NF, Shah NH, Whetzel PL, Dai B, Dorf M, Griffith M, Rubin DL, Storey MA, Chute CG (2009) Bioportal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res 37:170–173 Ah B, Lp B, Lc P, Lc B, Dl S (1996) Taking a bite out of crisp strategies on using and conducting searches in the computer retrieval of information on scientific projects database. Comput Nurs 14–4:218–24 Martinez-Rodriguez Jose L, Ivan Lopez-Arevalo ABR-A (2018) Openie-based approach for knowledge graph construction from text. Expert Syst Appl 113:339–355 Kollia I, Glimm B, Horrocks I (2011) Sparql query answering over owl ontologies. In: Proceedings of the 8th extended semantic web conference on the semantic web: research and applications (ESWC), vol. part 1, pp 382–396 Doing-Harris K, Livnat Y, Meystre S (2015) Automated concept and relationship extraction for the semi-automated ontology management (seam) system. J Biomed Sem 6(1):15 Alobaidi M, Malik KM, Sabra S (2018) Linked open data-based framework for automatic biomedical ontology generation. BMC Bioinform 19(1):319 Qawasmeh O, Lefrançois M, Zimmermann A, Maret P (2018) Improved categorization of computer-assisted ontology construction systems: focus on bootstrapping capabilities Bast H, Buchhold B, Haussmann E (2016) Semantic search on text and knowledge bases. Found Trends® Inform Retrieval 10:119–271 Khanam SA, Youn HY (2016) A web service discovery scheme based on structural and semantic similarity. J Inform Sci Eng 32–1:153–176 Jaana K (2005) Ontology as a search-tool: a study of real users’ query formulation with and without conceptual support. In: Advances in information retrieval Amato F, Moscato V, Picariello A, Sperlí G (2017) Kira: a system for knowledge-based access to multimedia art collections. In: 2017 IEEE 11th international conference on semantic computing (ICSC), pp 338–343 Musen AM, Team P (2015) The protégé project: a look back and a look forward. AI Matters 1–4:4–12 Thomas R, Fabian S, Johannes H, Joanna B, Erdal K, Gerhard W (2016) Yago: a multilingual knowledge base from wikipedia, wordnet, and geonames. In: The semantic web–ISWC 2016. Springer, Cham, pp 177–185 Jastrzebski S, Bahdanau D, Hosseini S, Noukhovitch M, Bengio Y, Cheung JCK (2018) Commonsense mining as knowledge base completion? A study on the impact of novelty. CoRR arXiv:abs/1804.09259 Lenat DB (1995) Cyc: a large-scale investment in knowledge infrastructure. Commun ACM 38(11):33–38 Trinh TH, Le QV (2018) A simple method for commonsense reasoning. CoRR arXiv:abs/1806.02847 Young T, Cambria E, Chaturvedi I, Zhou H, Biswas S, Huang M (2018) Augmenting end-to-end dialogue systems with commonsense knowledge. AAAI Manning CD, Surdeanu M, Bauer J, Finkel J, Inc P, Bethard SJ, Mcclosky D (2014) The stanford corenlp natural language processing toolkit. In: In ACL, system demonstrations Goldman RS (2018) Structural aspects of constructing meaning from text. In: Kamil PBM, Pearson PD, Barr R eds, M.LHandbook of Reading Research, pp 311–335 Al-Zaidy RA, Giles CL (2018) Extracting semantic relations for scholarly knowledge base construction. In: 2018 IEEE 12th international conference on semantic computing (ICSC). Laguna Hills, pp 56–63 Upadhyay P, Bindal A, Kumar M, Ramanath M (2018) Construction and applications of teknowbase: a knowledge base of computer science concepts. In: Companion proceedings of the the web conference 2018 (WWW), pp 1023–1030 Coronado DS, Haber MW, Sioutos N, Wright LW (2004) Nci thesaurus: using science-based terminology to integrate cancer research results. Medinfo 107:33–37 Manning DC, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D (2014) The stanford corenlp natural language processing toolkit. In: Proceedings of the 52nd annual meeting of the association for computational linguistics: system demonstrations, pp 55–60 Horridge M, Bechhofer S (2011) The owl api: a Java API for owl ontologies. Semantic Web 2–1:11–21 O’Connor MJ, Halaschek-Wiener C, Musen MA (2010) M2: a language for mapping spreadsheets to owl. In: OWLED Bailey RW (2004) The meaning of everything: the story of the Oxford english dictionary (review). In: Kamil PBM, Pearson PD, Barr R, eds. Dictionaries, pp 169–174