Integrating the optimal classifier set for sentiment analysis
Tóm tắt
Automatic identification of users’ sentiment is important for many Web applications, such as recommender systems and business intelligence. Sentiment analysis can be treated as a classification task, which tries to identify the user’s overall sentiment expressed in documents. But it is difficult for users to select a classifier for a special analyzed domain, since each classifier would achieve various performances in different domains. Thus, we proposed a three phase solution of multiple classifiers for sentiment analysis, in which an optimal set of classifiers is selected and integrated automatically. An approximate algorithm is designed to tackle the Combinatorial Explosion Problem of classifier set selection, which can be proven to be 2-approximation. At last, extensive experiments carried out on real-world datasets show that the proposed solution outperforms not only the best single classifier methods, but also the state-of-art competitors of ensemble learning.
Tài liệu tham khảo
Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R (2011) Sentiment analysis of twitter data. In: Proceedings of the workshop on languages in social media. Association for Computational Linguistics, Stroudsburg, pp 30–38
Alm CO, Roth D, Sproat R (2005) Emotions from text: machine learning for text-based emotion prediction. In: Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, pp 579–586
Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification, Annual Meeting-Association For. Computational Linguistics 45(1):440–447
Borodin A, Lee HC, Ye Y (2012) Max-sum diversification, monotone submodular functions and dynamic updates. In: Proceedings of the 31st symposium on principles of database systems. ACM, New York, pp 155–166
Breiman L (1996) Bagging predictors. Mach Learn. 24(2), (Springer-Verlag, New York), pp 123–140
Breiman L (1996) Random forest. Machine Learning. 45(1), (Springer-Verlag, New York), pp 5–32
Cui H, Mittal V, Datar M (2006) Comparative experiments on sentiment classification for online product reviews, AAAI, vol 6. AAAI Press, Palo Alto, pp 1265–1270
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 55(1), (Elsevier, Oxford), pp 119–139
Gamon M, Aue A (2005) Automatic identification of sentiment vocabulary: exploiting low association with known sentiment terms. In: Proceedings of the ACL workshop on feature engineering for machine learning in natural language processing. Association for Computational Linguistics, Stroudsburg, pp 57–64
Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: Proceedings of the 35th annual meeting of the association for computational linguistics and eighth Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp 174–181
Hu X, Tang J, Gao H, Liu H (2013) Unsupervised sentiment analysis with emotional signals. In: Proceedings of the 22nd international conference on World Wide Web. ACM, New York, pp 607–618
Kamps J, Marx M, Mokken RJ, Rijke MD (2004) Using wordnet to measure semantic orientations of adjectives. In: Proceedings of the 4th LREC, pp. 1115–1118
Kim S, Hovy E (2006) Identifying and analyzing judgment opinions. In: Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics. Association for Computational Linguistics, Stroudsburg, pp 200C207
Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: ACL Association for Computational Linguistics, Stroudsburg, pp 510–520
Lin Y, Wang X, Zhang J, Zhou A (1990) Introduction to wordnet: an on-line lexical database. Int J Lexicogr, 3(4). (Oxford Univ Press, Oxford), pp 235–244
Lin Y, Zhang J, Wang X, Zhou A (2012) An information theoretic approach to sentiment polarity classification. In: Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality. ACM, New York, pp 35–40
Lin Y, Zhang J, Wang X, Zhou A (2012) Sentiment classification via integrating multiple feature presentations. In: Proceedings of the 21st international conference companion on World Wide Web. ACM, New York, pp 569–570
Liu B (2010) Sentiment analysis and subjectivity, Handbook of natural language processing, vol 2. (Chapman & Hall, London), pp 627–666
Martineau J, Finin T, Joshi A, Patel S (2009) Improving binary classification on text problems using differential word features. In: Proceedings of the 18th ACM conference on Information and knowledge management. ACM, New York, pp 2019–2024
Matsumoto S, Takamura H, Okumura M (2005) Sentiment classification using word sub-sequences and dependency sub-trees. Advances in knowledge discovery and data mining. Springer, New York, pp 301–311
Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to wordnet: An on-line lexical database. Int J Lexicogr, 3(4). (Oxford Univ Press, Oxford), pp 235–244
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, vol 10. Association for Computational Linguistics, Stroudsburg, pp 79–86
Ravi SS, Rosenkrantz DJ, Rayi GK (1994) Heuristic and special case algorithms for dispersion problems. Oper Res 42(2), (INFORMS), pp 299–310
Skalak DB (1996) The sources of increased accuracy for two proposed boosting algorithms. In: Proceedings of American Association for Artificial Intelligence, AAAI-96, Integrating Multiple Learned Models Workshop. (Citeseer), pp 1133–1138
Tang J, Nobata C, Dong A, Chang Y, Liu H (2015) Propagation-based sentiment analysis for microblogging data. In: SIAM international conference on data mining
Turney PD (2002) Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, Stroudsburg, pp 417–424
Vee E, Srivastava U, Shanmugasundaram J, Bhar P, Yahia SA (2008) Efficient computation of diverse query results. In: IEEE 24th International Conference on Data Engineering, IEEE, Washinton, pp 228–236
Wolpert DH (1992) Stacked generalization. Neural Networks. 5(2), (Elsevier, Oxford), pp 241–260
Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci. 181(6), (Elsevier, Oxford), pp 1138–1152
Xu L, Li B, Chen E (2012) Ensemble pruning via constrained Eigen-Optimization. In: 12th international conference on data mining. IEEE, Washington, pp 715–724
Zhou Z, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell, 137(1). (Elsevier, Oxford), pp 239–263