Integrating the optimal classifier set for sentiment analysis

Social Network Analysis and Mining - Tập 5 - Trang 1-13 - 2015

Yuming Lin¹, Xiaoling Wang², You Li³, Aoying Zhou²

¹Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, China

²Institute for Data Science and Engineering, East China Normal University, Shanghai, China

³Guangxi Key Laboratory of Automatic Detecting Technology and Instruments, Guilin University of Electronic Technology, Guilin, China

Tóm tắt

Automatic identification of users’ sentiment is important for many Web applications, such as recommender systems and business intelligence. Sentiment analysis can be treated as a classification task, which tries to identify the user’s overall sentiment expressed in documents. But it is difficult for users to select a classifier for a special analyzed domain, since each classifier would achieve various performances in different domains. Thus, we proposed a three phase solution of multiple classifiers for sentiment analysis, in which an optimal set of classifiers is selected and integrated automatically. An approximate algorithm is designed to tackle the Combinatorial Explosion Problem of classifier set selection, which can be proven to be 2-approximation. At last, extensive experiments carried out on real-world datasets show that the proposed solution outperforms not only the best single classifier methods, but also the state-of-art competitors of ensemble learning.

Tài liệu tham khảo

Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R (2011) Sentiment analysis of twitter data. In: Proceedings of the workshop on languages in social media. Association for Computational Linguistics, Stroudsburg, pp 30–38 Alm CO, Roth D, Sproat R (2005) Emotions from text: machine learning for text-based emotion prediction. In: Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, pp 579–586 Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification, Annual Meeting-Association For. Computational Linguistics 45(1):440–447 Borodin A, Lee HC, Ye Y (2012) Max-sum diversification, monotone submodular functions and dynamic updates. In: Proceedings of the 31st symposium on principles of database systems. ACM, New York, pp 155–166 Breiman L (1996) Bagging predictors. Mach Learn. 24(2), (Springer-Verlag, New York), pp 123–140 Breiman L (1996) Random forest. Machine Learning. 45(1), (Springer-Verlag, New York), pp 5–32 Cui H, Mittal V, Datar M (2006) Comparative experiments on sentiment classification for online product reviews, AAAI, vol 6. AAAI Press, Palo Alto, pp 1265–1270 Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 55(1), (Elsevier, Oxford), pp 119–139 Gamon M, Aue A (2005) Automatic identification of sentiment vocabulary: exploiting low association with known sentiment terms. In: Proceedings of the ACL workshop on feature engineering for machine learning in natural language processing. Association for Computational Linguistics, Stroudsburg, pp 57–64 Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: Proceedings of the 35th annual meeting of the association for computational linguistics and eighth Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp 174–181 Hu X, Tang J, Gao H, Liu H (2013) Unsupervised sentiment analysis with emotional signals. In: Proceedings of the 22nd international conference on World Wide Web. ACM, New York, pp 607–618 Kamps J, Marx M, Mokken RJ, Rijke MD (2004) Using wordnet to measure semantic orientations of adjectives. In: Proceedings of the 4th LREC, pp. 1115–1118 Kim S, Hovy E (2006) Identifying and analyzing judgment opinions. In: Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics. Association for Computational Linguistics, Stroudsburg, pp 200C207 Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: ACL Association for Computational Linguistics, Stroudsburg, pp 510–520 Lin Y, Wang X, Zhang J, Zhou A (1990) Introduction to wordnet: an on-line lexical database. Int J Lexicogr, 3(4). (Oxford Univ Press, Oxford), pp 235–244 Lin Y, Zhang J, Wang X, Zhou A (2012) An information theoretic approach to sentiment polarity classification. In: Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality. ACM, New York, pp 35–40 Lin Y, Zhang J, Wang X, Zhou A (2012) Sentiment classification via integrating multiple feature presentations. In: Proceedings of the 21st international conference companion on World Wide Web. ACM, New York, pp 569–570 Liu B (2010) Sentiment analysis and subjectivity, Handbook of natural language processing, vol 2. (Chapman & Hall, London), pp 627–666 Martineau J, Finin T, Joshi A, Patel S (2009) Improving binary classification on text problems using differential word features. In: Proceedings of the 18th ACM conference on Information and knowledge management. ACM, New York, pp 2019–2024 Matsumoto S, Takamura H, Okumura M (2005) Sentiment classification using word sub-sequences and dependency sub-trees. Advances in knowledge discovery and data mining. Springer, New York, pp 301–311 Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to wordnet: An on-line lexical database. Int J Lexicogr, 3(4). (Oxford Univ Press, Oxford), pp 235–244 Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, vol 10. Association for Computational Linguistics, Stroudsburg, pp 79–86 Ravi SS, Rosenkrantz DJ, Rayi GK (1994) Heuristic and special case algorithms for dispersion problems. Oper Res 42(2), (INFORMS), pp 299–310 Skalak DB (1996) The sources of increased accuracy for two proposed boosting algorithms. In: Proceedings of American Association for Artificial Intelligence, AAAI-96, Integrating Multiple Learned Models Workshop. (Citeseer), pp 1133–1138 Tang J, Nobata C, Dong A, Chang Y, Liu H (2015) Propagation-based sentiment analysis for microblogging data. In: SIAM international conference on data mining Turney PD (2002) Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, Stroudsburg, pp 417–424 Vee E, Srivastava U, Shanmugasundaram J, Bhar P, Yahia SA (2008) Efficient computation of diverse query results. In: IEEE 24th International Conference on Data Engineering, IEEE, Washinton, pp 228–236 Wolpert DH (1992) Stacked generalization. Neural Networks. 5(2), (Elsevier, Oxford), pp 241–260 Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci. 181(6), (Elsevier, Oxford), pp 1138–1152 Xu L, Li B, Chen E (2012) Ensemble pruning via constrained Eigen-Optimization. In: 12th international conference on data mining. IEEE, Washington, pp 715–724 Zhou Z, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell, 137(1). (Elsevier, Oxford), pp 239–263

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA