Concept Mover’s Distance: measuring concept engagement via word embeddings in texts

Journal of Computational Social Science - Tập 2 Số 2 - Trang 293-313 - 2019
Dustin S. Stoltz1, Marshall A. Taylor2
1Department of Sociology, University of Notre Dame, Notre Dame, USA
2Department of Sociology, New Mexico State University, Las Cruces, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Artetxe, M., Labaka, G., & Agirre, E. (2016). Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2289–2294). Austin: Association for Computational Linguistics.

Benoit, K., & Watanabe, K. (2019). quanteda.corpora: A Collection of Corpora for quanteda. R package version 0.86. https://github.com/quanteda/quanteda.corpora . Accessed 18 Feb 2019.

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

Boas, F. S. (1896). Shakespeare and his predecessors. London: John Murray.

Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.

Bonikowski, B., & Gidron, N. (2016). The populist style in American politics: Presidential campaign discourse, 1952–1996. Social Forces, 94, 1593–1621.

Brokos, G. -I., Malakasiotis, P, & Androutsopoulos, I. (2016). Using centroids of word embeddings and Word Mover’s Distance for biomedical document retrieval in question answering. arXiV preprint arXiv:1608.03905 .

Core R Team. (2013). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.

Dennett, D. C. (1991). Consciousness explained. Boston: Back Bay Books.

Diuk, C. G., Fernandez Slezak, D., Raskovsky, I., Sigman, M., & Cecchi, G. A. (2012). A quantitative philology of introspection. Frontiers in Integrative Neuroscience, 6, 1–12.

Dodds, E. R. (1951). The Greeks and the irrational. Berkeley: The University of California Press.

Ellis, N. C. (2019). Essentials of a theory of language cognition. The Modern Language Journal, 103, 39–60.

Emirbayer, M. (1997). Manifesto for relational sociology. American Journal of Sociology, 103, 281–317.

Firth, J. (1957). A synopsis of linguistic theory, 1930–1955. In Studies in linguistic analysis (pp. 168–205). Oxford: Blackwell.

Garg, N., Schiebinger, L., Jurafsky, D., & Zou, J. (2018). Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115, E3635–E3644.

Garvin, P. L. (1962). Computer participation in linguistic research. Language, 38(4), 385–389.

Greimas, A. (1983). Structural semantics: An attempt at a method. Lincoln: University of Nebraska Press.

Hamilton, W. L., Leskovec, J., & Jurafsky, D. (2016). Diachronic word embeddings reveal statistical laws of semantic change. In: Proceedings of the $$54{th}$$ Annual Meeting of the Association for Computational Linguistics (pp. 1489–1501). Berlin: Association for Computational Linguistics.

Ignatow, G. (2009). Culture and embodied cognition: Moral discourses in internet support groups for overeaters. Social Forces, 88, 643–670.

Jaynes, J. (1976). The origins of consciousness in the breakdown of the bicameral mind. Boston: Houghton Mifflin.

Jaynes, J. (1986). Consciousness and the voices of the mind. Lecture given at the Canadian Psychological Association Symposium on Consciousness. Halifax: Canadian Psychological Association.

Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., & Mikolov, T. (2016). FastText.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 .

Klementiev, A., Titov, I., & Bhattarai, B. (2012). Inducing crosslingual distributed representations of words. In: Proceedings of COLING 2012: Technical Papers (pp. 1459–1474 ). Mumbai: Association for Computational Linguistics.

Kozlowski, A. C., Taddy, M., & Evans, J. A. (2018). The geometry of culture: Analyzing meaning through word embeddings. arXiv preprint arXiv:1803.09288 .

Kusner, M. J., Sun, Y., Kolkin, N. I., & Weinberger, K. Q. (2015). From word embeddings to document distances. In: Proceedings of the $$32{nd}$$ International Conference on Machine Learning. Lille: International Machine Learning Society.

Lakoff, G. (2002). Moral politics: How liberals and conservatives think. Chicago: The University of Chicago Press.

Leaf, W. (1892). A companion to the iliad, for English readers. London: MacMillan and Co.

Lenci, A. (2018). Distributional models of word meaning. Annual Review of Linguistics, 4, 151–171.

Levina, E., & Peter, B. (2001). The Earth Mover’s Distance is the mallows distance: Some insights from statistics. In: IEEE Proceedings of the Eighth IEEE International Conference on Computer Vision. Vancouver: Institute of Electrical and Electronics Engineers.

Meyers, V. (1991). George Orwell. London: MacMillan.

Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In: Proceedings of NAACL-HLT 2013 (pp. 746–751 ). Atlanta: Association for Computational Linguistics.

Mohr, John W. (1998). Measuring meaning structures. Annual Review of Sociology, 24, 345–370.

Mullins, Daniel Austin, Hoyer, Daniel, Collins, Christina, Currie, Thomas, Freeney, Kevin, François, Pieter, et al. (2018). A systematic assessment of ’Axial Age’ proposals using global comparative historical evidence. American Sociological Review, 83, 596–626.

Pagel, Mark, Atkinson, Quentin D., & Meade, Andrew. (2007). Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature, 49, 717–721.

Pele, O., & Werman, M. (2009). Fast and Robust Earth Mover’s Distances. In: 2009 IEEE $$12{th}$$ International Conference on Computer Vision (pp. 460–467). Kyoto: Institute of Electrical and Electronics Engineers.

Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (pp. 1532–1543). Doha: Association for Computational Linguistics.

Project Gutenberg. (2019). Project Gutenberg. https://www.gutenberg.org/wiki/Main_Page . Accessed 18 Feb 2019.

Raskovsky, I., Fernández Slezak, D., Diuk, C. G., & Cecchi, G. A. (2010). The emergence of the modern concept of introspection: A quantitative linguistic analysis. In: Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas (pp. 68–75 ). Los Angeles: Association for Computational Linguistics.

Rosch, Eleanor, & Mervis, Carolyn B. (1975). Family resemblances: Studies in the internal structure of categories. Cognitive Psychology, 7, 573–605.

Rubner, Y., Tomasi, C., & Guibas, L. J. (1998). A metric for distributions with applications to image databases. In: Proceedings of the 1998 IEEE International Conference on Computer Vision. Bombay: Institute of Electrical and Electronics Engineers.

Scheff, Thomas J. (2011). What’s love got to do with it? Emotions and relationships in pop songs. New York: Routledge.

Schloerke, B., Crowley, J., Cook, D., Hofmann, H., Wickham, H., Briatte, F., Marbach, M., Thoen, E., Elberg, A., & Larmarange, J. (2018). “GGally: Extension to ‘ggplot2.”’ R package version 1.4.0. https://cran.r-project.org/web/packages/GGally/GGally.pdf . Accessed 18 Feb 2019.

Snell, B. (2013). The Discovery of the Mind: The Greek Origins of European Thought. Translated by T. G. Rosenmeyer. Tacoma: Angelico Press (1953) .

Selivanov, D., & Wang, Q. (2018). text2vec: Modern text mining framework for R.” R package 0.5.1 documentation. https://cran.r-project.org/web/packages/text2vec/text2vec.pdf . Accessed 16 Feb 2019.

Smith, S., Turban, D., Hamblin, S., & Hammerla, N. (2017). Offline bilingual word vectors, orthogonal transformations and the inverted softmax. arXiv preprint arXiv:1702.03859 .

Taylor, John R. (2003). Linguistic categorization. New York: Oxford University Press.

Taylor, Marshall A., Stoltz, Dustin S., & McDonnell, Terence E. (2019). Binding signicance to form: Cultural objects, neural binding, and cultural change. Poetics, 73, 1–16.

The American Presidency Project. (2018). Annual Messages to Congress on the State of the Union (Washington 1790—Trump 2018). https://www.presidency.ucsb.edu/documents/presidential-documents-archive-guidebook/annual-messages-congress-the-state-the-union . Accessed 3 Feb 2019.

Urban Institute Research. (2019). urbnthemes: Urban Institute’s ggplot2 Theme and Tools. https://github.com/UI-Research/urbnthemes . Accessed 18 Feb 2019.

Xing, C., Wang, D., Liu, C., & Lin, Y. (2015). Normalized word embedding and orthogonal transform for bilingual word translation. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1006–1011). Denver: Association for Computational Linguistics.

Wickham, Hadley. (2016). ggplot2: Elegant graphics for data science. New York: Springer.