Guiding the Training of Distributed Text Representation with Supervised Weighting Scheme for Sentiment Analysis
Tóm tắt
With the rapid growth of social media, sentiment analysis has received growing attention from both academic and industrial fields. One line of researches for sentiment analysis is to feed bag-of-words (BOW) text representation into classifiers. Usually, raw BOW requires weighting schemes to obtain better performance, where important words are given more weights while unimportant ones are given less weights.
Another line of researches focuses on neural models, where distributed text representations are learned from raw texts automatically. In this paper, we take advantages of techniques in both lines of researches.
We use words’ weights to guide neural models to focus on important words. Various supervised weighting schemes are explored in this work. We discover that better text features are learned for sentiment analysis when suitable weighting schemes are applied upon neural models.
Tài liệu tham khảo
Church KW, Hanks P (1990) Word association norms, mutual information, and lexicography. Comput Linguist 16(1):22–29
Dai AM, Le QV (2015) Semi-supervised sequence learning. In: Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, December 7–12, Montreal, Quebec, Canada, pp 3079–3087
Deng ZH, Luo KH, Yu HL (2014) A study of supervised term weighting scheme for sentiment analysis. Expert Syst Appl 41(7):3506–3513
Denil M, Demiraj A, Kalchbrenner N, Blunsom P, de Freitas N (2014) Modelling, visualising and summarising documents with a single convolutional neural network. arXiv preprint arXiv:1406.3830
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874
Goldberg Y (2016) A primer on neural network models for natural language processing. J Artif Intell Res 57:345–420
Iyyer M, Manjunatha V, Boyd-Graber JL, Daumé III H (2015) Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the asian federation of natural language processing, ACL 2015, July 26–31, Beijing, China, volume 1: long papers, pp 1681–1691
Johnson R, Zhang T (2015) Semi-supervised convolutional neural networks for text categorization via region embedding. In: Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, December 7–12, Montreal, Quebec, Canada, pp 919–927
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882
Kim Y, Zhang O (2014) Credibility adjusted term frequency: A supervised term weighting scheme for sentiment analysis and text classification. arXiv preprint arXiv:1405.3518
Le QV, Mikolov T (2014) Distributed representations of sentences and documents. ICML 14:1188–1196
Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. In: Advances in neural information processing systems 27: annual conference on neural information processing systems 2014, December 8–13, Montreal, Quebec, Canada, pp 2177–2185
Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned from word embeddings. Trans Assoc Comput Linguist 3:211–225
Li B, Liu T, Du X, Zhang D, Zhao Z (2015) Learning document embeddings by predicting n-grams for sentiment classification of long movie reviews. arXiv preprint arXiv:1512.08183
Li J (2014) Feature weight tuning for recursive neural networks. arXiv preprint arXiv:1412.3714
Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Vol 1. pp 142–150. Association for Computational Linguistics
Martineau J, Finin T (2009) Delta tfidf: an improved feature space for sentiment analysis. Icwsm 9:106
Mesnil G, Mikolov T, Ranzato M, Bengio Y (2014) Ensemble of generative and discriminative techniques for sentiment analysis of movie reviews. arXiv preprint arXiv:1412.5335
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013, December 5–8, Lake Tahoe, NV, USA, pp 3111–3119
Paltoglou G, Thelwall M (2010) A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th annual meeting of the association for computational linguistics, pp 1386–1395. Association for Computational Linguistics
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Vol 10, pp 79–86. Association for Computational Linguistics
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. EMNLP 14:1532–1543
Socher R, Huval B, Manning CD, Ng AY (2012) Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 Joint Conference on empirical methods in natural language processing and computational natural language learning, pp 1201–1211. Association for Computational Linguistics
Wang S, Manning CD (2012) Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th annual meeting of the asssociation for computational linguistics: short papers, Vol 2, pp 90–94. Association for Computational Linguistics
Zhao Z, Liu T, Hou X, Li B, Du X (2016) Distributed text representation with weighting scheme guidance for sentiment analysis. In: Asia-Pacific web conference, Springer, pp 41–52