Matching object catalogues
Tóm tắt
A catalogue holds information about a set of objects, typically classified using terms taken from a given thesaurus, and described with the help of a set of attributes. Matching a pair of catalogues means to find a relationship between the terms of their thesauri and a relationship between their attributes. This paper first introduces a matching approach, based on the notion of similarity, that applies to both thesauri and attribute matching. It then describes matchings based on mutual information and introduces variations that explore certain heuristics. Finally, it discusses experimental results that evaluate the precision of the matchings and that measure the influence of the heuristics.
Tài liệu tham khảo
ADL (1999) Alexandria digital library gazetteer. Map and Imagery Lab, Davidson Library, University of California, Santa Barbara, CA. Copyright UC Regents. http://www.alexandria.ucsb.edu/gazetteer
Bernstein P, Melnik S (2007) Model management 2.0: manipulating richer mappings. In: Proc. 2007 ACM SIGMOD Intl. Conf. on Management of Data, pp 1–12. ACM Press, New York, NY, USA
Bilke A, Naumann F (2005) Schema matching using duplicates. In: Naumann F (ed) Proc. 21st Int’l. Conf. on Data Engineering, pp 69–80
Brauner DF, Casanova MA, Milidiú RL (2006) Mediation as recommendation: an approach to design mediators for object catalogues. In: OTM Confederated International Workshops and Posters. Montpellier, France, 29 October–3 November 2006. Lecture Notes in Computer Science, vol 4278, pp 46–47. ISSN 0302-9743
Brauner DF, Casanova MA, Milidiú RL (2007a) Towards gazetteer integration through an instance-based thesauri mapping approach. In: Advances in geoinformatics. Springer, Heidelberg, pp 235–245
Brauner DF, Gazola A, Casanova MA (2008) Adaptative matching of database web services export schemas. In: Proc. Int’l. Conf. on Enterprise Information Systems, Barcelona, Spain
Brauner DF, Intrator C, Freitas JC, Casanova MA (2007b) An instance-based approach for matching export schemas of geographical database web services. In: Vinhas L, da Rocha Costa AC, (eds) IX Proc. Brazilian Symposium on Geoinformatics, pp 109–120
Casanova MA, Breitman KK, Brauner DF, Marins AL (2007) Database conceptual schema matching. Computer, IEEE Computer Society, pp 102–104
Castano S, Ferrara A, Montanelli S, Racca G (2004) Semantic information interoperability in open networked systems. In: Proc. Int’l. Conf. on Semantics of a Networked World (ICSNW), in cooperation with ACM SIGMOD 2004, Paris, France
Euzenat J, Shvaiko P (2007) Ontology matching. Springer, New York
Frakes W, Baeza-Yates R (1992) Information retrieval: data structure and algorithms. Prentice Hall, Englewood Cliffs, NJ, USA
GNIS (2005) Geographic Names Information System, U.S. Department of the Interior, U.S. Geological Survey, Reston, USA. http://geonames.usgs.gov/
GNS (2006) GEOnet Names Server, U.S. National Geospatial-Intelligence Agency, USA. http://gnswww.nga.mil/geonames/GNS
Hill L, Frew J, Zheng Q (1999) Geographic names: the implementation of a gazetteer in a geo-referenced digital library. In: D-Lib. http://www.dlib.org/dlib/january99/hill/01hill.html
Hindle D (1990) Noun classification from predicate-argument structures. In: Proc. 28th annual meeting of the association for computational linguistics, pp 268–275, Morristown, NJ, USA
ISO-2788 (1986) Documentation—guidelines for the development of monolingual thesauri, International Standard ISO-2788, 2nd edn, pp 11–15
Janée G (2004) ADL Gazetteer Service Protocol v.1.2. http://www.alexandria.ucsb.edu/gazetteer/protocol/
Lee J (1993) Information retrieval based on conceptual distance in Is-A hierarchies. J Document 49(2): 188–207
Leme LAP, Casanova MA (2008) Schema matching using similarity models. Technical Report 28/08. Department of Informatics, PUC-Rio
Lin D (1998) An information-theoretic definition of similarity. In: Proc. 15th Int’l. Conf. on Machine Learning, pp 296–304, Madison, WI
Madhavan J, Cohen S, Dong XL, Halevy AY, Jeffery SR, Ko D, Yu C (2007) Web-scale data integration: you can afford to pay as you go. In: CIDR, pp 342–350. http://www.crdrdb.org
Madhavan J, Madhavan J, Bernstein P, Doan A, Halevy A (2005) Corpus-based schema matching. In: Bernstein P (ed) Proc. 21st Int’l. Conf. on Data Engineering ICDE 2005, pp 57–68
Manning CD, Schütze H (2000) Foundations of statistical natural language processing, chap 8, pp 265–271. The MIT Press, Cambridge, England
Percivall G (2003) OpenGIS® Reference Model, Document number OGC 03-040, Version 0.1.3, Open GIS Consortium, Inc
Rahm E, Bernstein P (2001) A survey of approaches to automatic schema matching. VLDB J 10(4): 334–350
Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proc. 14th Int’l. Joint Conf. on Artificial Intelligence, pp 448–453
Spertus E, Sahami M, Buyukkokten O (2005) Evaluating similarity measures: a large-scale study in the orkut social network. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, IL, USA, August 21–24, pp 678–684
Tversky A (1977) Features of similarity. Psychol Rev 84(4): 327–352
UNESCO (1995) UNESCO Thesaurus. United Nations Educational, Scientific and Cultural Organization. http://www.ulcc.ac.uk/unesco
Wang J, Wen J, Lochovsky F, Ma W (2004) Instance-based schema matching for web databases by domain-specific query probing. In: Nascimento MA, Özsu MT, Kossmann D, Miller RJ, Blakeley JA, Schiefer KB (eds) Proc.13th Int’l. Conf. on Very Large Data Bases, pp 408–419, Toronto, Canada
Wordnet (2005) Wordnet—a lexical database for the English language. Cognitive Science Laboratory, Princeton University, Princeton, NJ, USA. http://wordnet.princeton.edu