Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo

Kỹ thuật chọn đặc trưng hiệu quả cho phân tích cảm xúc

Multimedia Tools and Applications - Tập 79 - Trang 6313-6335 - 2019

Avinash Madasu¹, Sivasankar Elango²

¹Samsung R and D Institute India, Bengaluru, Bagmane Constellation Business Park, Bengaluru, India

²Department of Computer Science, National Institute of Technology, Tiruchirappalli, India

Tóm tắt

Phân tích cảm xúc là một lĩnh vực nghiên cứu tập trung vào việc xác định và phân loại các ý tưởng được thể hiện dưới dạng văn bản thành các độ thiên lệch tích cực, tiêu cực và trung tính. Quá trình chọn đặc trưng là một bước quan trọng trong học máy. Trong bài báo này, chúng tôi nhằm mục đích nghiên cứu hiệu suất của các kỹ thuật chọn đặc trưng khác nhau cho phân tích cảm xúc. Tần số thuật ngữ Ngược tài liệu (TF-IDF) được sử dụng như một kỹ thuật trích xuất đặc trưng để tạo ra từ vựng đặc trưng. Nhiều kỹ thuật Chọn đặc trưng (FS) đã được thử nghiệm nhằm chọn bộ đặc trưng tốt nhất từ từ vựng đặc trưng. Các đặc trưng đã chọn sẽ được huấn luyện bằng các bộ phân loại học máy khác nhau như Hồi quy Logistic (LR), Máy vector hỗ trợ (SVM), Cây quyết định (DT) và Naive Bayes (NB). Các kỹ thuật tổ hợp như Bagging và Không gian ngẫu nhiên được áp dụng trên các bộ phân loại để nâng cao hiệu suất trong phân tích cảm xúc. Chúng tôi cho thấy rằng, khi các kỹ thuật FS tốt nhất được huấn luyện bằng các phương pháp tổ hợp sẽ đạt được kết quả nổi bật trong phân tích cảm xúc. Chúng tôi cũng so sánh hiệu suất của các phương pháp FS được huấn luyện bằng Bagging, Không gian ngẫu nhiên với các kiến trúc mạng nơ-ron khác nhau. Chúng tôi chứng minh rằng các kỹ thuật FS được huấn luyện bằng các bộ phân loại tổ hợp vượt trội hơn mạng nơ-ron trong khi yêu cầu thời gian và tham số huấn luyện đáng kể thấp hơn, do đó loại bỏ nhu cầu điều chỉnh siêu tham số một cách phức tạp.

Từ khóa

#phân tích cảm xúc #chọn đặc trưng #TF-IDF #hồi quy logistic #máy vector hỗ trợ #cây quyết định #Naive Bayes #kỹ thuật tổ hợp #mạng nơ-ron

Tài liệu tham khảo

Abbasi A, Chen H C, Salem A (2008) Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. In: ACM transactions on information systems (TOIS), 2008, 26(3) Abdi A, Shamsuddin S M, Hasan S, Piran J (2019) Deep learning-based sentiment classification of evaluative text based on Multi-feature fusion. Inf Process Manag 56(4):1245–1259 Agarwal B, Mittal N (2012) Categorical probability proportion difference (CPPD): a feature selection method for sentiment classification. In: Proceedings of the 2nd workshop on sentiment analysis where ai meets psychology, pp 17–26 Agarwal B, Mittal N (2013) Optimal feature selection for sentiment analysis. In: International conference on intelligent text processing and computational linguistics. Springer, Berlin, pp 13–24 Bahassine S, Madani A, Al-Sarem M, Kissi M (2018) Feature selection using an improved Chi-square for Arabic text classification. Journal of King Saud University-Computer and Information Sciences Barandiaran I (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):1–22 Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th annual meeting of the association of computational linguistics, pp 440–447 Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140 Cai J, Song F (2008) Maximum entropy modeling with feature selection for text categorization. In: Li H, Liu T, Ma WY, Sakai T, Wong KF, Zhou G (eds) Information retrieval technology. AIRS 2008. Lecture notes in computer science, vol 4993. Springer, Berlin Chi X, Siew T P, Cambria E (2017) Adaptive two-stage feature selection for sentiment classification. In: IEEE international conference on systems, man, and cybernetics (SMC), pp 1238–1243 Conneau A, Schwenk H, Barrault L, Lecun Y (2016) Very deep convolutional networks for text classification. arXiv:1606.01781 Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection. In: Icml, vol 1, pp 74–81 From Group to Individual Labels using Deep Features’, Kotzias et al. KDD, 2015 Galavotti L, Sebastiani F, Simi M (2000) Experiments on the use of feature selection and negative evidence in automated text categorization. In: Borbinha J, Baker T (eds) Research and advanced technology for digital libraries. ECDL 2000. Lecture Notes in Computer Science, vol 1923. Springer, Berlin Gao Z, Wang D Y, Wan S H, Zhang H, Wang Y L (2019) Cognitive-inspired class-statistic matching with triple-constrain for camera free 3D object retrieval. Futur Gener Comput Syst 94:641–653 Gao Z, Xuan H Z, Zhang H, Wan S, Choo KKR (2019) Adaptive fusion and category-level dictionary learning model for multi-view human action recognition. IEEE Internet of Things Journal Harris ZS (1954) Distributional structure. Word 10.2-3:146–162 Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural computation 9(8):1735–1780 Jones KS (2004) A statistical interpretation of term specificity and its application in retrieval. Journal of documentation Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: EACL, 427–431. Association for computational linguistics Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882 Labani M, Moradi P, Ahmadizar F, Jalili M (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intel 70:25–37 Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196 Lee J, Yu I, Park J, Kim D W (2019) Memetic feature selection for multilabel text categorization using label frequency difference. Inform Sci 485:263–280 Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. arXiv:1605.05101 López M, Valdivia A, Martínez-Cámara E, Luzón MV, Herrera F (2019) E2SAM: Evolutionary ensemble of sentiment analysis methods for domain adaptation. Inform Sci 480:273–286 Metsis V, Androutsopoulos I, Paliouras G (2006) Spam filtering with naive bayes which naive bayes?. In: Proceedings of CEAS Morinaga S, Yamanishi K, Tateishi K, Fukushima T (2002) Mining product reputations on the web. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 341–349. ACM O’Keefe T, Koprinska I (2009) Feature selection and weighting methods in sentiment analysis. In: Proceedings of the 14th Australasian document computing symposium, Sydney, pp 67–74 Oussous A, Lahcen AA, Belfkih S (2019) Impact of text pre-processing and ensemble learning on arabic sentiment analysis. In: Proceedings of the 2nd international conference on networking, information systems and security, pp 65. ACM Pang B, Lee L (2005) Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics (pp. 115–124). Association for Computational Linguistics Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing - Vol 10,EMNLP ’02, pp 79–86 Pascanu R, Mikolov T, Bengio Y (2012) Understanding the exploding gradient problem. arXiv:1211.5063, 2 Plackett R L (1983) Karl Pearson and the chi-squared test. International Statistical Review/Revue Internationale de Statistique, pp 59–72 Pong-Inwong C, Kaewmak K (2016) Improved sentiment analysis for teaching evaluation using feature selection and voting ensemble learning integration. In: 2nd IEEE international conference on computer and communications (ICCC), pp 1222–1225 Rehman A, Javed K, Babri H A, Saeed M (2015) Relative discrimination criterion–A novel feature ranking method for text data. Expert Syst Appl 42:3670–3681 Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1-2):23–69 Tan S, Zhang J (2008) An empirical study of sentiment analysis for chinese documents. Expert Syst Appl 34(4):2622–2629 Tang J, Alelyani S, Liu H (2014) Feature selection for classification: A review. Data classification: Algorithms and applications, pp 37 Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative. J Mach Learn Res 10(66-71):13 Wang S, Li D, Wei Y, Li H (2009) A feature selection method based on fisher’s discriminant ratio for text sentiment classification. In: Liu W, Luo X, Wang FL, Lei J (eds) Web information systems and mining. WISM 2009. Lecture notes in computer science, vol 5854. Springer, Berlin Wang S, Manning CD (2012) Baselines and bigrams: Simple, good sentiment and topic classification. In: Proceedings of the 50th annual meeting of the association for computational linguistics: Short papers-volume 2 (pp. 90–94). Association for Computational Linguistics Xiao L, Zhang H, Chen W, Wang Y, Jin Y (2018) Transformable convolutional neural network for text classification. In IJCAI, pp 4496–4502

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA