Geometric analysis of concept vectors based on similarity values

Lingua Sinica - Tập 3 - Trang 1-23 - 2017
Hui Liu1, Jianyong Duan2
1Shanghai University of International Business and Economics, Shanghai, China
2North China University of Technology, Beijing, China

Tóm tắt

In this paper, we offer a geometric framework for the computing of a concept’s conceptual vector based on its similarity position with other concepts in a vector space called concept space, which is a set of concept vectors together with a distance function derived from a similarity model. We show that there exists an isometry to map a concept space to a Euclidean space. So, the concept vector can be mapped to a coordinate in a Euclidean space and vice versa. Therefore, given only the similarity position of a concept, we can locate its coordinate and its concept vector subsequently, using distance geometry methods. We prove that such mapping functions do exist under some conditions. We also discuss how to map non-numerical attributes. At last, we show some preliminary experimental results and thoughts in the implementation of an attribute mining task. This work will benefit attribute retrieval tasks.

Tài liệu tham khảo

Adams, Benjamin, and Martin Raubal. 2009. A metric conceptual space algebra. In International Conference on Spatial Information Theory, ed. Kathleen S. Hornsby, Christophe Claramunt, Michel Denis, and Gérard Ligozat, 51–68. Berlin, Heidelberg: Springer. Agirre, Eneko, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Paşca, and Aitor Soroa. 2009. A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 19–27. Stroudsburg: Association for Computational Linguistics. Alfonseca, Enrique, Marius Pasca, and Enrique Robledo-Arnuncio. 2010. Acquisition of instance attributes via labeled and related instances. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, 58–65. New York: ACM. Artiles, Javier, Andrew Borthwick, Julio Gonzalo, Satoshi Sekine, and Enrique Amigó. 2010. WePS-3 evaluation campaign: overview of the web people search clustering and attribute extraction tasks. Paper presented at Proceedings of the Third Web People Search Evaluation Forum (WePS-3). Padua: CLEF 2010. Artiles, Javier, Julio Gonzalo, and Satoshi Sekine. 2009. Weps 2 evaluation campaign: overview of the web people search attribute extraction task. Paper presented at 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference. Madrid. Baroni, Marco, Georgiana Dinu, and Germán Kruszewski. 2014. Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 238–247. Stroudsburg: Association for Computational Linguistics. Baroni, Marco, and Alessandro Lenci. 2010. Distributional memory: A general framework for corpus-based semantics. Computational Linguistics 36(4):673–721. Baroni, Marco, and Alessandro Lenci. 2011. How we blessed distributional semantic evaluation. In Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, 1–10. Stroudsburg: Association for Computational Linguistics. Baroni, Marco, Brian Murphy, Eduard Barbu, and Massimo Poesio. 2010. Strudel: A corpus-based semantic model based on properties and types. Cognitive Science 34(2):222–254. Bellare, Kedar, Partha Pratim Talukdar, Giridhar Kumaran, Fernando Pereira, Mark Liberman, Andrew McCallum, and Mark Dredze. 2007. Lightly-supervised attribute extraction. In NIPS 2007 Workshop on Machine Learning for Web Search, 1–7. Blackburn, Patrick. 1993. Modal logic and attribute value structures. In Diamonds and Defaults, Synthese Library, ed. Maarten de Rijke, 19–65. Berlin, Heidelberg: Kluwer Academic Publishers Group. Bollegala, Danushka, Yutaka Matsuo, and Mitsuru Ishizuka. 2007. Measuring semantic similarity between words using web search engines. In WWW ’07: Proceedings of the 16th International Conference on World Wide Web, 757–766. New York: ACM. Brin, Sergey. 1999. Extracting patterns and relations from the world wide web. In The World Wide Web and Databases, ed. Paolo Atzeni, Alberto Mendelzon, and Giansalvatore Mecca, 172–183. London: Springer-Verlag. Budanitsky, Alexander, and Graeme Hirst. 2001. Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. In Workshop on WordNet and Other Lexical Resources, Second Meeting of the North American Chapter of the Association for Computational Linguistics, 29–34. Stroudsburg: Association for Computational Linguistics. Bullinaria, John A., and Joseph P. Levy. 2007. Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods 39(3):510–526. Chen, Hsin-Hsi, Ming-Shun Lin, and Yu-Chuan Wei. 2006. Novel association measures using web search with double checking. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, Association for Computational Linguistics Morristown, 1009–1016. Stroudsburg: Association for Computational Linguistics. Dong, Qunfeng, and Zhijun Wu. 2002. A linear-time algorithm for solving the molecular distance geometry problem with exact inter-atomic distances. Journal of Global Optimization 22(1-4):365–375. Erk, Katrin. 2012. Vector space models of word meaning and phrase meaning: a survey. Language and Linguistics Compass 6(10):635–653. Fellbaum, Christiane. 1998. WordNet. Cambridge: MIT Press. Finkelstein, Lev, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2001. Placing search in context: the concept revisited. In Proceedings of the 10th International Conference on World Wide Web, 406–414. New York: ACM. Frixione, Marcello, and Antonio Lieto. 2013. Dealing with concepts: from cognitive psychology to knowledge representation. Frontiers of Psychological and Behevioural Science 2(3):96–106. Gao, Jian-Bo, Bao-Wen Zhang, and Xiao-Hua Chen. 2015. A WordNet-based semantic similarity measurement combining edge-counting and information content theory. Engineering Applications of Artificial Intelligence 39:80–88. Gärdenfors, Peter. 2004. Conceptual spaces: The geometry of thought. Cambridge: MIT press. Gärdenfors, Peter. 2014. The geometry of meaning: Semantics based on conceptual spaces. Cambridge: MIT Press. Gärdenfors, Peter, and Mary-Anne Williams. 2001. Reasoning about categories in conceptual spaces. In Proceedings of the Fourteenth International Joint Conference of Artificial Intelligence, 385–392. San Francisco: Morgan Kaufmann Publishers Inc. Gentner, Dedre. 1983. Structure-mapping: A theoretical framework for analogy. Cognitive Science 7(2):155–170. Ghani, Rayid, Katharina Probst, Yan Liu, Mark Kremao, and Andrew Fano. 2006. Text mining for product attribute extraction. ACM SIGKDD Explorations Newsletter 8(1):41–48. Hahn, Ulrike, and Nick Chater. 1997. Concepts and similarity. In Knowledge, concepts and categories, ed. Koen Lamberts and David Shanks, 43–92. East Sussex: Psychology Press. Han, Lushan, Tim Finin, Paul McNamee, Akanksha Joshi, and Yelena Yesha. 2013a. Improving word similarity by augmenting PMI with estimates of word polysemy. IEEE Transactions on Knowledge and Data Engineering 25(6):1307–1322. Han, Lushan, Abhay L. Kashyap, Tim Finin, James Mayfield, and Johnathan Weese. 2013b. UMBC ebiquity-core: Semantic textual similarity systems. In Proceedings 2nd Joint Conference on Lexical and Computational Semantics, 44–52. Stroudsburg: Association for Computational Linguistics. Harris, Zellig S. 1954. Distributional structure. Word 10(2-3):146–162. Janowicz, Krzysztof, Martin Raubal, and Werner Kuhn. 2012. The semantics of similarity in geographic information retrieval. Journal of Spatial Information Science 2011(2):29–57. Jiang, Jay J., and David W. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference on Research in Computational Linguistics, 19–33. Stroudsburg: Association for Computational Linguistics. Jin, Peng, Likun Qiu, Xuefeng Zhu, and Pengyuan Liu. 2014. A hypothesis on word similarity and its application. In Chinese Lexical Semantics, ed. Xinchun Su and Tingting He, 317–325. Cham: Springer. Kopliku, Arlind, Mohand Boughanem, and Karen Pinel-Sauvagnat. 2011. Towards a framework for attribute retrieval. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, ed. Bettina Berendt, Arjen de Vries, Wenfei Fan, Craig Macdonald, Iadh Ounis, and Ian Ruthven, 515–524. New York: ACM. Li, Yuhua, Zuhair A. Bandar, and David McLean. 2003. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering 15(4):871–882. Liberti, Leo, Carlile Lavor, Nelson Maculan, and Antonio Mucherino. 2014. Euclidean distance geometry and applications. SIAM Review 56(1):3–69. Lieto, Antonio, Andrea Minieri, Alberto Piana, and Daniele P. Radicioni. 2015. A knowledge-based system for prototypical reasoning. Connection Science 27(2):137–152. Liu, Bing. 2011. Opinion mining and sentiment analysis. In Web Data Mining, ed. Bing Liu, 459–526. Berlin Heidelberg: Springer. Liu, Hui, and Jianyong Duan. 2015. Attribute construction for online products by similarity computing. ICIC Express Letters 9(1):99–105. Liu, Hongzhe, Hong Bao, and De Xu. 2012. Concept vector for semantic similarity and relatedness based on WordNet structure. Journal of Systems and Software 85(2):370–381. McRae, Ken, George S. Cree, Mark S. Seidenberg, and Chris McNorgan. 2005. Semantic feature production norms for a large set of living and nonliving things. Behavior Research Methods 37(4):547–559. Medin, Douglas L., Robert L. Goldstone, and Dedre Gentner. 1990. Similarity involving attributes and relations: judgments of similarity and difference are not inverses. Psychological Science 1(1):64–69. Mihalcea, Rada, Courtney Corley, and Carlo Strapparava. 2006. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the 21st Conference on Artificial Intelligence, vol. 1, ed. Anthony Cohn, 775–780. Cambridge: AAAI Press. Miller, George A., and Walter G. Charles. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1):1–28. Mitchell, Jeff, and Mirella Lapata. 2008. Vector-based models of semantic composition. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, 236–244. Stroudsburg: Association for Computational Linguistics. Nagy, István, Richárd Farkas, and Márk Jelasity. 2009. Researcher affiliation extraction from homepages. In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, 1–9. Stroudsburg: Association for Computational Linguistics. Nosofsky, Robert M. 1986. Attention, similarity, and the identification–categorization relationship. Journal of Experimental Psychology: General 115(1):39–57. Nosofsky, Robert M. 1992. Similarity scaling and cognitive process models. Annual Review of Psychology 43(1):25–53. Pang, Bo, and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends®; in Information Retrieval 2(1–2):1–135. Pasca, Marius, and Benjamin V. Durme. 2007. What you seek is what you get: Extraction of class attributes from query logs. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), ed. Rajeev Sangal, Harish Mehta, and R. K. Bagga, 2832–2837. San Francisco: Morgan Kaufmann Publishers Inc. Petrakis, Euripides G.M., Giannis Varelas, Angelos Hliaoutakis, and Paraskevi Raftopoulou. 2006. X-similarity: Computing semantic similarity between concepts from different ontologies. Journal of Digital Information Management 4(4):233–237. Probst, Katharina, Rayid Ghani, Marko Krema, Andrew Fano, and Yan Liu. 2007. Semi-supervised learning of attribute-value pairs from product descriptions. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), ed. Rajeev Sangal, Harish Mehta, and R. K. Bagga, 2838–2843. San Francisco: Morgan Kaufmann Publishers Inc. Raubal, Martin. 2004. Formalizing conceptual spaces. In Formal Ontology in Information Systems, Proceedings of the Third International Conference (FOIS 2004), ed. Achille C. Varzi and Laure Vieu, 153–164. Amsterdam: IOS Press. Reisberg, Daniel. 2013. The Oxford handbook of cognitive psychology. Oxford: University Press. Resnik, Philip. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conferences on Artificial Intelligence, ed. Chris S. Mellish, 448–453. San Francisco: Morgan Kaufmann Publishers Inc. Rodríguez, M. Andrea, and Max J. Egenhofer. 2003. Determining semantic similarity among entity classes from different ontologies. IEEE Transactions on Knowledge and Data Engineering 15(2):442–456. Sahlgren, Magnus. 2005. An introduction to random indexing. Paper presented at Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering. Copenhagen: TKE. Sánchez, David, Montserrat Batet, David Isern, and Aida Valls. 2012. Ontology-based semantic similarity: A new feature-based approach. Expert Systems with Applications 39(9):7718–7728. Santus, Enrico, Emmanuel Chersoni, Alessandro Lenci, Chu-Ren Huang, and Philippe Blache. 2016. Testing APSyn against vector cosine on similarity estimation. In Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation, Korean Society for Language and Information, 229–238. Seoul. Santus, Enrico, Qin Lu, Alessandro Lenci, and Chu-Ren Huang. 2014. Unsupervised antonym-synonym discrimination in vector space. In First Italian Conference on Computational Linguistics (CLiC-it 2014), ed. Roberto Basili, Alessandro Lenci, and Bernardo Magnini, 328–333. Pisa: Pisa University Press. Sekine, Satoshi. 2008. Extended named entity ontology with attribute information. In Proceedings of the Sixth International Language Resources and Evaluation (LREC ’08), ed. Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, and Daniel Tapias, 52–57. Paris: European Language Resources Association. Shepard, Roger N. 1987. Toward a universal law of generalization for psychological science. Science 237(4820):1317–1323. Sippl, Manfred J., and Harold A. Scheraga. 1986. Cayley-Menger coordinates. Proceedings of the National Academy of Sciences of the United States of America 83(8):2283–2287. Suchanek, Fabian M., Gjergji Kasneci, and Gerhard Weikum. 2008. Yago: A large ontology from wikipedia and WordNet. In Web Semantics: Science, Services and Agents on the World Wide Web, 203–217. Tokunaga, Kosuke, Jun’ichi Kazama, and Kentaro Torisawa. 2005. Automatic discovery of attribute words from web documents. In Proceedings of the Second International Joint Conference on Natural Language Processing (IJCNLP-05), ed. Robert Dale, Kam-Fai Wong, Jian Su, and Oi Yee Kwong, 106–118. Berlin, Heidelberg: Springer-Verlag. Turney, Peter D. 2006. Similarity of semantic relations. Computational Linguistics 32(3):379–416. Turney, Peter D., and Patrick Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37(1):141–188. Tversky, Amos. 1977. Features of similarity. Psychological Review 84(4):327–352. Varelas, Giannis, Epimenidis Voutsakis, Paraskevi Raftopoulou, Euripides G.M. Petrakis, and Evangelos E. Milios. 2005. Semantic similarity methods in WordNet and their application to information retrieval on the web. In Proceedings of the 7th annual ACM international workshop on Web information and data management, 10–16. New York: ACM. Vigliocco, Gabriella, David P. Vinson, William Lewis, and Merrill F. Garrett. 2004. Representing the meanings of object and action words: the featural and unitary semantic space hypothesis. Cognitive Psychology 48(4):422–488. Wille, Rudolf. 1984. Liniendiagramme hierarchischer begriffssysteme. Anwendungen der Klassifikation: Datenanalyse und numerische Klassifikation, 32–51. Frankfurt: Indeks-Verlag. Wu, Zhibiao, and Martha Palmer. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, 133–138. Stroudsburg: Association for Computational Linguistics.