Integrating the optimal classifier set for sentiment analysis

Social Network Analysis and Mining - Tập 5 - Trang 1-13 - 2015
Yuming Lin1, Xiaoling Wang2, You Li3, Aoying Zhou2
1Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, China
2Institute for Data Science and Engineering, East China Normal University, Shanghai, China
3Guangxi Key Laboratory of Automatic Detecting Technology and Instruments, Guilin University of Electronic Technology, Guilin, China

Tóm tắt

Automatic identification of users’ sentiment is important for many Web applications, such as recommender systems and business intelligence. Sentiment analysis can be treated as a classification task, which tries to identify the user’s overall sentiment expressed in documents. But it is difficult for users to select a classifier for a special analyzed domain, since each classifier would achieve various performances in different domains. Thus, we proposed a three phase solution of multiple classifiers for sentiment analysis, in which an optimal set of classifiers is selected and integrated automatically. An approximate algorithm is designed to tackle the Combinatorial Explosion Problem of classifier set selection, which can be proven to be 2-approximation. At last, extensive experiments carried out on real-world datasets show that the proposed solution outperforms not only the best single classifier methods, but also the state-of-art competitors of ensemble learning.

Tài liệu tham khảo

Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R (2011) Sentiment analysis of twitter data. In: Proceedings of the workshop on languages in social media. Association for Computational Linguistics, Stroudsburg, pp 30–38 Alm CO, Roth D, Sproat R (2005) Emotions from text: machine learning for text-based emotion prediction. In: Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, pp 579–586 Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification, Annual Meeting-Association For. Computational Linguistics 45(1):440–447 Borodin A, Lee HC, Ye Y (2012) Max-sum diversification, monotone submodular functions and dynamic updates. In: Proceedings of the 31st symposium on principles of database systems. ACM, New York, pp 155–166 Breiman L (1996) Bagging predictors. Mach Learn. 24(2), (Springer-Verlag, New York), pp 123–140 Breiman L (1996) Random forest. Machine Learning. 45(1), (Springer-Verlag, New York), pp 5–32 Cui H, Mittal V, Datar M (2006) Comparative experiments on sentiment classification for online product reviews, AAAI, vol 6. AAAI Press, Palo Alto, pp 1265–1270 Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 55(1), (Elsevier, Oxford), pp 119–139 Gamon M, Aue A (2005) Automatic identification of sentiment vocabulary: exploiting low association with known sentiment terms. In: Proceedings of the ACL workshop on feature engineering for machine learning in natural language processing. Association for Computational Linguistics, Stroudsburg, pp 57–64 Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: Proceedings of the 35th annual meeting of the association for computational linguistics and eighth Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp 174–181 Hu X, Tang J, Gao H, Liu H (2013) Unsupervised sentiment analysis with emotional signals. In: Proceedings of the 22nd international conference on World Wide Web. ACM, New York, pp 607–618 Kamps J, Marx M, Mokken RJ, Rijke MD (2004) Using wordnet to measure semantic orientations of adjectives. In: Proceedings of the 4th LREC, pp. 1115–1118 Kim S, Hovy E (2006) Identifying and analyzing judgment opinions. In: Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics. Association for Computational Linguistics, Stroudsburg, pp 200C207 Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: ACL Association for Computational Linguistics, Stroudsburg, pp 510–520 Lin Y, Wang X, Zhang J, Zhou A (1990) Introduction to wordnet: an on-line lexical database. Int J Lexicogr, 3(4). (Oxford Univ Press, Oxford), pp 235–244 Lin Y, Zhang J, Wang X, Zhou A (2012) An information theoretic approach to sentiment polarity classification. In: Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality. ACM, New York, pp 35–40 Lin Y, Zhang J, Wang X, Zhou A (2012) Sentiment classification via integrating multiple feature presentations. In: Proceedings of the 21st international conference companion on World Wide Web. ACM, New York, pp 569–570 Liu B (2010) Sentiment analysis and subjectivity, Handbook of natural language processing, vol 2. (Chapman & Hall, London), pp 627–666 Martineau J, Finin T, Joshi A, Patel S (2009) Improving binary classification on text problems using differential word features. In: Proceedings of the 18th ACM conference on Information and knowledge management. ACM, New York, pp 2019–2024 Matsumoto S, Takamura H, Okumura M (2005) Sentiment classification using word sub-sequences and dependency sub-trees. Advances in knowledge discovery and data mining. Springer, New York, pp 301–311 Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to wordnet: An on-line lexical database. Int J Lexicogr, 3(4). (Oxford Univ Press, Oxford), pp 235–244 Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, vol 10. Association for Computational Linguistics, Stroudsburg, pp 79–86 Ravi SS, Rosenkrantz DJ, Rayi GK (1994) Heuristic and special case algorithms for dispersion problems. Oper Res 42(2), (INFORMS), pp 299–310 Skalak DB (1996) The sources of increased accuracy for two proposed boosting algorithms. In: Proceedings of American Association for Artificial Intelligence, AAAI-96, Integrating Multiple Learned Models Workshop. (Citeseer), pp 1133–1138 Tang J, Nobata C, Dong A, Chang Y, Liu H (2015) Propagation-based sentiment analysis for microblogging data. In: SIAM international conference on data mining Turney PD (2002) Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, Stroudsburg, pp 417–424 Vee E, Srivastava U, Shanmugasundaram J, Bhar P, Yahia SA (2008) Efficient computation of diverse query results. In: IEEE 24th International Conference on Data Engineering, IEEE, Washinton, pp 228–236 Wolpert DH (1992) Stacked generalization. Neural Networks. 5(2), (Elsevier, Oxford), pp 241–260 Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci. 181(6), (Elsevier, Oxford), pp 1138–1152 Xu L, Li B, Chen E (2012) Ensemble pruning via constrained Eigen-Optimization. In: 12th international conference on data mining. IEEE, Washington, pp 715–724 Zhou Z, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell, 137(1). (Elsevier, Oxford), pp 239–263