Guiding the Training of Distributed Text Representation with Supervised Weighting Scheme for Sentiment Analysis

Data Science and Engineering - Tập 2 - Trang 178-186 - 2017
Zhe Zhao1, Tao Liu1, Shen Li2, Bofang Li1, Xiaoyong Du1
1School of Information, Renmin University of China, Beijing, China
2Institute of Chinese Information Processing, Beijing Normal University, Beijing, China

Tóm tắt

With the rapid growth of social media, sentiment analysis has received growing attention from both academic and industrial fields. One line of researches for sentiment analysis is to feed bag-of-words (BOW) text representation into classifiers. Usually, raw BOW requires weighting schemes to obtain better performance, where important words are given more weights while unimportant ones are given less weights. Another line of researches focuses on neural models, where distributed text representations are learned from raw texts automatically. In this paper, we take advantages of techniques in both lines of researches. We use words’ weights to guide neural models to focus on important words. Various supervised weighting schemes are explored in this work. We discover that better text features are learned for sentiment analysis when suitable weighting schemes are applied upon neural models.

Tài liệu tham khảo

Church KW, Hanks P (1990) Word association norms, mutual information, and lexicography. Comput Linguist 16(1):22–29 Dai AM, Le QV (2015) Semi-supervised sequence learning. In: Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, December 7–12, Montreal, Quebec, Canada, pp 3079–3087 Deng ZH, Luo KH, Yu HL (2014) A study of supervised term weighting scheme for sentiment analysis. Expert Syst Appl 41(7):3506–3513 Denil M, Demiraj A, Kalchbrenner N, Blunsom P, de Freitas N (2014) Modelling, visualising and summarising documents with a single convolutional neural network. arXiv preprint arXiv:1406.3830 Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874 Goldberg Y (2016) A primer on neural network models for natural language processing. J Artif Intell Res 57:345–420 Iyyer M, Manjunatha V, Boyd-Graber JL, Daumé III H (2015) Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the asian federation of natural language processing, ACL 2015, July 26–31, Beijing, China, volume 1: long papers, pp 1681–1691 Johnson R, Zhang T (2015) Semi-supervised convolutional neural networks for text categorization via region embedding. In: Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, December 7–12, Montreal, Quebec, Canada, pp 919–927 Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 Kim Y, Zhang O (2014) Credibility adjusted term frequency: A supervised term weighting scheme for sentiment analysis and text classification. arXiv preprint arXiv:1405.3518 Le QV, Mikolov T (2014) Distributed representations of sentences and documents. ICML 14:1188–1196 Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. In: Advances in neural information processing systems 27: annual conference on neural information processing systems 2014, December 8–13, Montreal, Quebec, Canada, pp 2177–2185 Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned from word embeddings. Trans Assoc Comput Linguist 3:211–225 Li B, Liu T, Du X, Zhang D, Zhao Z (2015) Learning document embeddings by predicting n-grams for sentiment classification of long movie reviews. arXiv preprint arXiv:1512.08183 Li J (2014) Feature weight tuning for recursive neural networks. arXiv preprint arXiv:1412.3714 Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Vol 1. pp 142–150. Association for Computational Linguistics Martineau J, Finin T (2009) Delta tfidf: an improved feature space for sentiment analysis. Icwsm 9:106 Mesnil G, Mikolov T, Ranzato M, Bengio Y (2014) Ensemble of generative and discriminative techniques for sentiment analysis of movie reviews. arXiv preprint arXiv:1412.5335 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013, December 5–8, Lake Tahoe, NV, USA, pp 3111–3119 Paltoglou G, Thelwall M (2010) A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th annual meeting of the association for computational linguistics, pp 1386–1395. Association for Computational Linguistics Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Vol 10, pp 79–86. Association for Computational Linguistics Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. EMNLP 14:1532–1543 Socher R, Huval B, Manning CD, Ng AY (2012) Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 Joint Conference on empirical methods in natural language processing and computational natural language learning, pp 1201–1211. Association for Computational Linguistics Wang S, Manning CD (2012) Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th annual meeting of the asssociation for computational linguistics: short papers, Vol 2, pp 90–94. Association for Computational Linguistics Zhao Z, Liu T, Hou X, Li B, Du X (2016) Distributed text representation with weighting scheme guidance for sentiment analysis. In: Asia-Pacific web conference, Springer, pp 41–52