Matching object catalogues

Innovations in Systems and Software Engineering - Tập 4 - Trang 315-328 - 2008
Luiz André P. Leme1, Daniela F. Brauner2, Karin K. Breitman1, Marco A. Casanova1, Alexandre Gazola1
1Department of Informatics, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil
2RNP, Brazilian National Research and Education Network, Rio de Janeiro, Brazil

Tóm tắt

A catalogue holds information about a set of objects, typically classified using terms taken from a given thesaurus, and described with the help of a set of attributes. Matching a pair of catalogues means to find a relationship between the terms of their thesauri and a relationship between their attributes. This paper first introduces a matching approach, based on the notion of similarity, that applies to both thesauri and attribute matching. It then describes matchings based on mutual information and introduces variations that explore certain heuristics. Finally, it discusses experimental results that evaluate the precision of the matchings and that measure the influence of the heuristics.

Tài liệu tham khảo

ADL (1999) Alexandria digital library gazetteer. Map and Imagery Lab, Davidson Library, University of California, Santa Barbara, CA. Copyright UC Regents. http://www.alexandria.ucsb.edu/gazetteer Bernstein P, Melnik S (2007) Model management 2.0: manipulating richer mappings. In: Proc. 2007 ACM SIGMOD Intl. Conf. on Management of Data, pp 1–12. ACM Press, New York, NY, USA Bilke A, Naumann F (2005) Schema matching using duplicates. In: Naumann F (ed) Proc. 21st Int’l. Conf. on Data Engineering, pp 69–80 Brauner DF, Casanova MA, Milidiú RL (2006) Mediation as recommendation: an approach to design mediators for object catalogues. In: OTM Confederated International Workshops and Posters. Montpellier, France, 29 October–3 November 2006. Lecture Notes in Computer Science, vol 4278, pp 46–47. ISSN 0302-9743 Brauner DF, Casanova MA, Milidiú RL (2007a) Towards gazetteer integration through an instance-based thesauri mapping approach. In: Advances in geoinformatics. Springer, Heidelberg, pp 235–245 Brauner DF, Gazola A, Casanova MA (2008) Adaptative matching of database web services export schemas. In: Proc. Int’l. Conf. on Enterprise Information Systems, Barcelona, Spain Brauner DF, Intrator C, Freitas JC, Casanova MA (2007b) An instance-based approach for matching export schemas of geographical database web services. In: Vinhas L, da Rocha Costa AC, (eds) IX Proc. Brazilian Symposium on Geoinformatics, pp 109–120 Casanova MA, Breitman KK, Brauner DF, Marins AL (2007) Database conceptual schema matching. Computer, IEEE Computer Society, pp 102–104 Castano S, Ferrara A, Montanelli S, Racca G (2004) Semantic information interoperability in open networked systems. In: Proc. Int’l. Conf. on Semantics of a Networked World (ICSNW), in cooperation with ACM SIGMOD 2004, Paris, France Euzenat J, Shvaiko P (2007) Ontology matching. Springer, New York Frakes W, Baeza-Yates R (1992) Information retrieval: data structure and algorithms. Prentice Hall, Englewood Cliffs, NJ, USA GNIS (2005) Geographic Names Information System, U.S. Department of the Interior, U.S. Geological Survey, Reston, USA. http://geonames.usgs.gov/ GNS (2006) GEOnet Names Server, U.S. National Geospatial-Intelligence Agency, USA. http://gnswww.nga.mil/geonames/GNS Hill L, Frew J, Zheng Q (1999) Geographic names: the implementation of a gazetteer in a geo-referenced digital library. In: D-Lib. http://www.dlib.org/dlib/january99/hill/01hill.html Hindle D (1990) Noun classification from predicate-argument structures. In: Proc. 28th annual meeting of the association for computational linguistics, pp 268–275, Morristown, NJ, USA ISO-2788 (1986) Documentation—guidelines for the development of monolingual thesauri, International Standard ISO-2788, 2nd edn, pp 11–15 Janée G (2004) ADL Gazetteer Service Protocol v.1.2. http://www.alexandria.ucsb.edu/gazetteer/protocol/ Lee J (1993) Information retrieval based on conceptual distance in Is-A hierarchies. J Document 49(2): 188–207 Leme LAP, Casanova MA (2008) Schema matching using similarity models. Technical Report 28/08. Department of Informatics, PUC-Rio Lin D (1998) An information-theoretic definition of similarity. In: Proc. 15th Int’l. Conf. on Machine Learning, pp 296–304, Madison, WI Madhavan J, Cohen S, Dong XL, Halevy AY, Jeffery SR, Ko D, Yu C (2007) Web-scale data integration: you can afford to pay as you go. In: CIDR, pp 342–350. http://www.crdrdb.org Madhavan J, Madhavan J, Bernstein P, Doan A, Halevy A (2005) Corpus-based schema matching. In: Bernstein P (ed) Proc. 21st Int’l. Conf. on Data Engineering ICDE 2005, pp 57–68 Manning CD, Schütze H (2000) Foundations of statistical natural language processing, chap 8, pp 265–271. The MIT Press, Cambridge, England Percivall G (2003) OpenGIS® Reference Model, Document number OGC 03-040, Version 0.1.3, Open GIS Consortium, Inc Rahm E, Bernstein P (2001) A survey of approaches to automatic schema matching. VLDB J 10(4): 334–350 Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proc. 14th Int’l. Joint Conf. on Artificial Intelligence, pp 448–453 Spertus E, Sahami M, Buyukkokten O (2005) Evaluating similarity measures: a large-scale study in the orkut social network. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, IL, USA, August 21–24, pp 678–684 Tversky A (1977) Features of similarity. Psychol Rev 84(4): 327–352 UNESCO (1995) UNESCO Thesaurus. United Nations Educational, Scientific and Cultural Organization. http://www.ulcc.ac.uk/unesco Wang J, Wen J, Lochovsky F, Ma W (2004) Instance-based schema matching for web databases by domain-specific query probing. In: Nascimento MA, Özsu MT, Kossmann D, Miller RJ, Blakeley JA, Schiefer KB (eds) Proc.13th Int’l. Conf. on Very Large Data Bases, pp 408–419, Toronto, Canada Wordnet (2005) Wordnet—a lexical database for the English language. Cognitive Science Laboratory, Princeton University, Princeton, NJ, USA. http://wordnet.princeton.edu