Integrating semantic directions with concept mover’s distance to measure binary concept engagement

Journal of Computational Social Science - Tập 4 - Trang 231-242 - 2020

Marshall A. Taylor¹, Dustin S. Stoltz²

¹Department of Sociology, New Mexico State University, Las Cruces, USA

²Department of Sociology and Anthropology, Lehigh University, Bethlehem, USA

Tóm tắt

In an earlier article published in this journal (“Concept Mover’s Distance”, 2019), we proposed a method for measuring concept engagement in texts that uses word embeddings to find the minimum cost necessary for words in an observed document to “travel” to words in a “pseudo-document” consisting only of words denoting a concept of interest. One potential limitation we noted is that, because words associated with opposing concepts will be located close to one another in the embedding space, documents will likely have similar closeness to starkly opposing concepts (e.g., “life” and “death”). Using aggregate vector differences between antonym pairs to extract a direction in the semantic space pointing toward a pole of the binary opposition (following “The Geometry of Culture,” American Sociological Review, 2019), we illustrate how CMD can be used to measure a document’s engagement with binary concepts.

Tài liệu tham khảo

Arseniev-Koehler, A., & Foster, J. (2020). Machine learning as a model for cultural learning: Teaching an algorithm what it means to be fat. SocArXiv. https://osf.io/preprints/socarxiv/c9yj3/. Atasu, K., Parnell, T., Dünner, C., Sifalakas, M., Pozidis, H., Vasileiadis, V., et al. (2017). Linear-complexity related word mover's distance with GPU acceleration. In J.-Y. Nie, Z. Obradovic, T. Suzumura, R. Ghosh, R. Nambiar, C. Wang, et al. (Eds.), 2017 IEEE international conference on big data (pp. 889–896). Boston: IEEE. Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., & Kalai, A. (2016). Quantifying and reducing stereotypes in word embeddings. arXiv. https://arxiv.org/abs/1606.06121. Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 29, pp. 4349–4357). Curran Associates Inc. Caliskan, A., Bryson, Joanna J, & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183–186. Ethayarajh, K., Duvenaud, D., & Hirst, G. (2019). Understanding undesirable word embedding associations. arXiv. https://arxiv.org/abs/1908.06361. Garg, N., Schiebinger, L., Jurafsky, D., & Zou, J. (2018). Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences of the United States of America, 115(16), E3635–E3644. Goldberg, A. (2011). Mapping shared understandings using relational class analysis: The case of the cultural omnivore reexamined. American Journal of Sociology, 116(5), 1397–1436. Kassambara, A. (2020). ggpubr: ‘ggplot2’ based publication ready plots. R package version 0.2.5. https://cran.r-project.org/web/packages/ggpubr/ggpubr.pdf. Accessed 11 June 2020. Kozlowski, A. C., Taddy, M., & Evans, J. A. (2019). The geometry of culture: Analyzing the meanings of class through word embeddings. American Sociological Review, 84(5), 905–949. Kusner, M., Sun, Y., Kolkin, N., & Weinberger, K. (2015). From word embeddings to document distances. In: International conference on machine learning (pp. 957–966). Lakoff, George. (2010). Moral politics: How liberals and conservatives think. Chicago: University of Chicago Press. Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016). Autoencoding beyond pixels using a learned similarity metric. In M. F. Balcan & K. Q. Weinberger (Eds.), Proceedings of the 33rd international conference on machine learning (pp. 1558–1566). New York: ACM. Makrai, M., Nemeskey, D., & Kornai, A. (2013). Applicative structure in vector space models. In A. Allauzen, H. Larochelle, C. Manning, & R. Socher (Eds.), Proceedings of the workshop on continuous vector space models and their compositionality (pp. 59–63). Sofia, Bulgaria: ACL. Mikolov, T, Yih, W.-T., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the north American chapter of the association for computational linguistics: Human language technologies (pp. 746–751). aclweb.org. Project Gutenberg. 2020. https://www.gutenberg.org/wiki/Main_Page. Rubner, Y., Tomasi, C., & Guibas L. J. (1998). A metric for distributions with applications to image databases. In Sixth international conference on computer vision (IEEE Cat. No. 98CH36271) (pp. 59–66). IEEE. Sahlgren, Magnus. (2008). The distributional hypothesis. Italian Journal of Disability Studies, 20, 33–53. Selivanov, D., Bickel, M., & Wang, Q. (2020) text2vec: Modern text mining framework for R. R package version 0.6. https://cran.r-project.org/web/packages/text2vec/text2vec.pdf. Accessed 11 June 2020. Stoltz, D. S., & Taylor, M. A. (2019). Concept mover’s distance: measuring concept engagement via word embeddings in texts. Journal of Computational Social Science, 2(2), 293–313. Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S (4th ed.). New York: Springer. (ISBN 0-387-95457-0). Wickham, Hadley. (2016). ggplot2: Elegant Graphics for Data Analysis. New York: Springer. Woolley, J. T, & Peters, G. (2008). The American presidency project, Santa Barbara. Available from: http://www.presidency.ucsb.edu/ws.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA