Cây quyết định sử dụng thuật toán ID3 cho phân tích ngữ nghĩa tiếng Anh

International Journal of Speech Technology - Tập 20 Số 3 - Trang 593-613 - 2017
Phu, Vo Ngoc1, Tran, Vo Thi Ngoc2, Chau, Vo Thi Ngoc3, Dat, Nguyen Duy4, Duy, Khanh Ly Doan5
1Institute of Research and Development, Duy Tan University - DTU, Da Nang, Vietnam
2School of Industrial Management (SIM), Ho Chi Minh City University of Technology - HCMUT, Vietnam National University, Ho Chi Minh City, Vietnam
3Computer Science & Engineering (CSE), Ho Chi Minh City University of Technology - HCMUT, Vietnam National University, Ho Chi Minh City, Vietnam
4Faculty of Information Technology, Ly Tu Trong Technical College, Ho Chi Minh City, Vietnam
5Faculty of Information Technology, Ho Chi Minh City University of Foreign Languages, Ho Chi Minh City, Vietnam

Tóm tắt

Xử lý ngôn ngữ tự nhiên đã được nghiên cứu trong nhiều năm và đã được áp dụng trong nhiều nghiên cứu cũng như các ứng dụng thương mại. Một mô hình mới được đề xuất trong bài báo này, được sử dụng trong phân loại cảm xúc ở cấp độ tài liệu tiếng Anh. Trong khảo sát này, chúng tôi đề xuất một mô hình mới bằng cách sử dụng thuật toán ID3 của cây quyết định để phân loại ngữ nghĩa (tích cực, tiêu cực và trung lập) cho các tài liệu tiếng Anh. Phân loại ngữ nghĩa của mô hình chúng tôi dựa trên nhiều quy tắc được tạo ra bằng cách áp dụng thuật toán ID3 cho 115.000 câu tiếng Anh trong bộ dữ liệu huấn luyện tiếng Anh của chúng tôi. Chúng tôi kiểm tra mô hình mới của mình trên bộ dữ liệu kiểm tra tiếng Anh bao gồm 25.000 tài liệu tiếng Anh, và đạt được độ chính xác 63,6% cho kết quả phân loại cảm xúc.

Từ khóa

#xử lý ngôn ngữ tự nhiên #phân loại cảm xúc #thuật toán ID3 #cây quyết định #ngữ nghĩa tiếng Anh

Tài liệu tham khảo

citation_journal_title=Prominent Feature Extraction for Sentiment Analysis; citation_title=Semantic orientation-based approach for sentiment analysis; citation_author=B Agarwal, N Mittal; citation_publication_date=2016; citation_doi=10.1007/978-3-319-25343-5; citation_id=CR1 citation_journal_title=Prominent Feature Extraction for Sentiment Analysis; citation_title=Machine learning approach for sentiment analysis; citation_author=B Agarwal, N Mittal; citation_publication_date=2016; citation_doi=10.1007/978-3-319-25343-5; citation_id=CR2 citation_journal_title=Computational Intelligence in Data Mining; citation_title=Effective sentimental analysis and opinion mining of web reviews using rule based classifiers; citation_author=S Ahmed, A Danti; citation_publication_date=2016; citation_id=CR3 citation_journal_title=International Journal of Intelligent Systems; citation_title=A mass assignment based ID3 algorithm for decision tree induction; citation_author=JF Baldwin, J Lawry, TP Martin; citation_publication_date=1997; citation_id=CR4 Canuto, S., Gonçalves, M. A., & Benevenuto, F. (2016) Exploiting new sentiment-based meta-level features for effective sentiment analysis. In Proceedings of the ninth ACM International conference on web search and data mining (WSDM ‘16), New York, USA (pp. 53–62). citation_journal_title=International Journal of Man-Machine Studies; citation_title=PRISM: An algorithm for inducing modular rules; citation_author=J Cendrowska; citation_volume=27; citation_issue=4; citation_publication_date=1987; citation_pages=349-370; citation_doi=10.1016/S0020-7373(87)80003-2; citation_id=CR6 Chaovalit, P., Zhou, L. (2005) Movie review mining: a comparison between supervised and unsupervised classification approaches. In Proceedings of the 38th annual hawaii international conference on system sciences. Cheng, J., Fayyad, U. M., Irani, K. B., & Qian, Z. (1988) Improved decision trees: A generalized version of ID3. In Proceedings of the fifth international conference on machine learning, Ann Arbor, Michigan, USA. citation_journal_title=IEEE Transactions on Neural Networks; citation_title=A machine learning method for generation of a neural network architecture: A continuous ID3 algorithm; citation_author=KJ Cios, N Liu; citation_volume=3; citation_issue=2; citation_publication_date=2002; citation_pages=280-291; citation_doi=10.1109/72.125869; citation_id=CR9 Cios, K. J., & Sztandera, L. M. (1992) Continuous ID3 algorithm with fuzzy entropy measures. In IEEE international conference on fuzzy systems (pp. 469–476). citation_journal_title=8887); citation_title=Automatic text classification: A technical review; citation_author=MK Dalal, M Zaveri; citation_volume=28; citation_issue=2; citation_publication_date=2011; citation_pages=37-40; citation_id=CR11 citation_journal_title=IEEE Transactions on Geoscience and Remote Sensing; citation_title=Unsupervised classification of multifrequency and fully polarimetric SAR images based on the H/A/Alpha-Wishart classifier; citation_author=L Ferro-Famil, E Pottier, J-S Lee; citation_volume=39; citation_issue=11; citation_publication_date=2002; citation_pages=2332-2342; citation_doi=10.1109/36.964969; citation_id=CR12 Gllavata, J., Ewerth, R., & Freisleben, B. (2004) Text detection in images based on unsupervised classification of high-frequency wavelet coefficients. In Proceedings of the 17th International conference on pattern recognition (ICPR 2004) (Vol. 1, pp. 425–428). Jin, C., De-lin, L., & Fen-xiang, M. (2009) An improved ID3 decision tree algorithm. In 4th international conference on computer science & education (ICCSE’09) (pp. 127–130). Kaur, A., & Duhan, N. (2015) A survey on sentiment analysis and opinion mining. International Journal of Innovations & Advancement in Computer Science (IJIACS), 4(Special Issue). ISSN 2347–8616. Large Movie Review Dataset (2017) http://ai.stanford.edu/~amaas/data/sentiment/ . citation_journal_title=IEEE Transactions on Geoscience and Remote Sensing; citation_title=Application of Dempster–Shafer evidence theory to unsupervised classification in multisource remote sensing; citation_author=S Hegarat-Mascle, I Bloch, D Vidal-Madjar; citation_volume=35; citation_issue=4; citation_publication_date=2002; citation_pages=1018-1031; citation_doi=10.1109/36.602544; citation_id=CR17 citation_journal_title=IEEE Transactions on Geoscience and Remote Sensing; citation_title=Unsupervised classification using polarimetric decomposition and the complex Wishart classifier; citation_author=J-S Lee, MR Grunes, TL Ainsworth, L-J Du; citation_volume=37; citation_issue=5; citation_publication_date=2002; citation_pages=2249-2258; citation_id=CR18 citation_journal_title=IEEE Transactions on Pattern Analysis and Machine Intelligence; citation_title=ICA mixture models for unsupervised classification of non-Gaussian classes and automatic context switching in blind signal separation; citation_author=T-W Lee, MS Lewicki, TJ Sejnowski; citation_volume=22; citation_issue=10; citation_publication_date=2002; citation_pages=1078-1089; citation_id=CR19 Maher, P. E., & Clair, D. S. (1993) Uncertain reasoning in an ID3 machine learning framework. In Second IEEE international conference on fuzzy systems (Vol. 1, pp.7–12). Mandal, A. K., & Sen, R. (2014) Supervised learning methods for bangla web document categorization. International Journal of Artificial Intelligence & Applications (IJAIA), 5(5). citation_journal_title=World Wide Web; citation_title=Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier; citation_author=AS Manek, PD Shenoy, MC Mohan, R V. K.; citation_publication_date=2016; citation_id=CR22 Ming, H., Wenying, N., & Xu, L. (2009) An improved decision tree classification algorithm based on ID3 and the application in score analysis. In Chinese control and decision conference (pp. 1876–1879). citation_journal_title=IJMO; citation_title=Modeling suspicious email detection using enhanced feature selection; citation_author=S Nizamani, N Memon, UK Wiil, P Karampelas; citation_volume=2; citation_issue=4; citation_publication_date=2013; citation_pages=371-377; citation_id=CR24 citation_journal_title=International Journal of Knowledge and Information Systems; citation_title=A valences-totaling model for English sentiment classification; citation_author=VN Phu, VTN Chau, ND Dat, VTN Tran, TA Nguyen; citation_publication_date=2017; citation_id=CR25 citation_journal_title=International Journal of Artificial Intelligence Review (AIR); citation_title=A Vietnamese adjective emotion dictionary based on exploitation of Vietnamese language characteristics; citation_author=VN Phu, VTN Chau, VTN Tran, ND Dat; citation_publication_date=2017; citation_id=CR26 citation_journal_title=International Journal of Evolving Systems; citation_title=A C4.5 algorithm for English emotional classification; citation_author=VN Phu, VTN Chau, VTN Tran, ND Dat; citation_publication_date=2017; citation_id=CR27 citation_journal_title=International Journal of Evolving Systems; citation_title=Semantic lexicons of English nouns for classification; citation_author=VN Phu, VTN Chau, VTN Tran, ND Dat, KLD Duy; citation_publication_date=2017; citation_id=CR28 citation_journal_title=International Journal of Evolving Systems (EVOS); citation_title=A valence-totaling model for Vietnamese sentiment classification; citation_author=VN Phu, VTN Chau, VTN Tran, ND Dat, KLD Duy; citation_publication_date=2017; citation_id=CR29 citation_journal_title=International Journal of Speech Technology (IJST); citation_title=SVM for English semantic classification in parallel environment; citation_author=VN Phu, VTN Chau, VTN Tran, ND Dat, KLD Duy; citation_publication_date=2017; citation_id=CR30 citation_journal_title=International Journal of Pattern Recognition and Artificial Intelligence; citation_title=STING algorithm used English sentiment classification in a parallel environment; citation_author=VN Phu, VTN Chau, VTN Tran, ND Dat, TA Nguyen; citation_publication_date=2017; citation_id=CR31 citation_journal_title=International Journal of Speech Technology (IJST); citation_title=Shifting semantic values of English phrases for classification; citation_author=VN Phu, ND Dat, VTN Chau, VTN Tran, KLD Duy; citation_publication_date=2017; citation_id=CR32 citation_journal_title=International Journal of Applied Intelligence (APIN); citation_title=Fuzzy C-means for english sentiment classification in a distributed system; citation_author=VN Phu, ND Dat, VTN Tran, VTN Chau, TA Nguyen; citation_publication_date=2017; citation_id=CR33 Phu, V. N., & Tuoi, P. T. (2014) Sentiment classification using enhanced contextual valence shifters. In International conference on Asian language processing (IALP) (pp. 224–229). Pong-Inwong, C., & Rungworawut, W. S. (2014) Teaching senti-lexicon for automated sentiment polarity definition in teaching evaluation. In 10th international conference on semantics, knowledge and grids (SKG) (pp. 84–91). Prasad, S. S., Kumar, J., Prabhakar, D. K., & Pal, S. (2016) Sentiment classification: An approach for Indian language tweets using decision tree. Mining Intelligence and Knowledge Exploration, Volume 9468 of the series Lecture Notes in Computer Science (pp. 656–663). Psomakelis, E., Tserpes, K., Anagnostopoulos, D., & Varvarigou, T. (2015) Comparing methods for Twitter sentiment analysis. arXiv:1505.02973 [cs.CL]. citation_journal_title=Journal of Materials Processing Technology; citation_title=Application of ID3 algorithm in knowledge acquisition for tolerance design; citation_author=X Shao, G Zhang, P Li, Y Chen; citation_volume=117; citation_issue=1–2; citation_publication_date=2001; citation_pages=66-74; citation_doi=10.1016/S0924-0136(01)01016-0; citation_id=CR38 Sharma, M. (2014) Z-CRIME: A data mining tool for the detection of suspicious criminal activities based on decision tree. In International conference on data mining and intelligent computing (ICDMIC) (pp. 1–6). citation_journal_title=International Journal of Science and Research (IJSR); citation_title=Mood prediction on tweets using classification algorithm; citation_author=S Shrivastava, PS Nair; citation_volume=14; citation_issue=1; citation_publication_date=2015; citation_pages=295-299; citation_id=CR40 Taboada, M., Voll, K., & Brooke, J. (2008) Extracting sentiment as a function of discourse structure and topicality. Technical Report 2008-20, School of Computing Science, Simon Fraser University. Tani, T., Sakoda, M., & Tanaka, K. (1992) Fuzzy modeling by ID3 algorithm and its application to prediction of heater outlet temperature. In IEEE international conference on fuzzy systems (pp. 923–930). Tran, V. T. N., Phu, V. N., & Tuoi, P. T. (2014) Learning more chi square feature selection to improve the fastest and most accurate sentiment classification. In The third Asian conference on information systems, ACIS 2014. Turney, P. D. (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In ACL ‘02 Proceedings of the 40th annual meeting on association for computational linguistics (pp. 417–424), USA. Umanol, M., Okamoto, H., Hatono, I., & Tamura, H. (1994) Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis systems. In Proceedings of the third IEEE conference on fuzzy systems, 1994. IEEE world congress on computational intelligence (pp. 2113–2118). citation_journal_title=IEEE Transactions on Geoscience and Remote Sensing; citation_title=Unsupervised classification of scattering behavior using radar polarimetry data; citation_author=JJ Zyl; citation_volume=27; citation_issue=1; citation_publication_date=2002; citation_pages=36-45; citation_id=CR46 citation_journal_title=International Journal of Computer Science and Business Informatics; citation_title=Performance evaluation of sentiment mining classifiers on balanced and imbalanced dataset; citation_author=G Vinodhini, RM Chandrasekaran; citation_volume=6; citation_issue=1; citation_publication_date=2013; citation_pages=1-8; citation_id=CR47 Voll, K., & Taboada, M. (2007) Not all words are created equal: Extracting semantic orientation as a function of adjective relevance. AI 2007: Advances in Artificial Intelligence, Volume 4830 of the series Lecture Notes in Computer Science (pp. 337–346). Wan, Y., & Gao, Q. (2015) An ensemble sentiment classification system of twitter data for airline services analysis. In 2015 IEEE international conference on data mining workshop (ICDMW) (pp. 1318–1325). citation_journal_title=Fuzzy Sets and Systems; citation_title=On the optimization of fuzzy decision trees; citation_author=X Wang, B Chen, G Qian, F Ye; citation_volume=112; citation_issue=1; citation_publication_date=2000; citation_pages=117-125; citation_doi=10.1016/S0165-0114(97)00386-2; citation_id=CR50 citation_journal_title=Soft Computing; citation_title=Data-based prediction of sentiments using heterogeneous model ensembles; citation_author=S Winkler, S Schaller, V Dorfer, M Affenzeller, G Petz, M Karpowicz; citation_volume=19; citation_issue=12; citation_publication_date=2015; citation_pages=3401-3412; citation_doi=10.1007/s00500-014-1325-6; citation_id=CR51 Xiao, M.-J., Huang, L.-S., Luo, Y.-L., & Shen, H. (2005) Privacy preserving ID3 algorithm over horizontally partitioned data. In Sixth international conference on parallel and distributed computing applications and technologies (PDCAT’05) (pp. 239–243). Yuxun, L., & Niuniu, X. (2010) Improved ID3 algorithm. In 3rd IEEE international conference on computer science and information technology (ICCSIT) (Vol. 8, pp. 465–468).