An empirical study of sentiment analysis utilizing machine learning and deep learning algorithms
Journal of Computational Social Science - Trang 1-17 - 2023
Tóm tắt
Among text-mining studies, one of the most studied topics is the text classification task applied in various domains, including medicine, social media, and academia. As a sub-problem in text classification, sentiment analysis has been widely investigated to classify often opinion-based textual elements. Specifically, user reviews and experiential feedback for products or services have been employed as fundamental data sources for sentiment analysis efforts. As a result of rapidly emerging technological advancements, social media platforms such as Twitter, Facebook, and Reddit, have become central opinion-sharing mediums since the early 2000s. In this sense, we build various machine-learning models to solve the sentiment analysis problem on the Reddit comments dataset in this work. The experimental models we constructed achieve F1 scores within intervals of 73–76%. Consequently, we present comparative performance scores obtained by traditional machine learning and deep learning models and discuss the results.
Tài liệu tham khảo
Al Amrani, Y., Lazaar, M., & El Kadiri, K. E. (2018). Random forest and support vector machine based hybrid approach to sentiment analysis. Procedia Computer Science, 127, 511–520.
Arias, M., Arratia, A., & Xuriguera, R. (2014). Forecasting with twitter data. ACM Transactions on Intelligent Systems and Technology (TIST), 5(1), 1–24.
Bakal, G., & Kavuluru, R. (2017). On quantifying diffusion of health information on twitter. In 2017 IEEE EMBS international conference on biomedical & health informatics (BHI) (pp. 485–488). https://doi.org/10.1109/BHI.2017.7897311
Bakal, G., Talari, P., Kakani, E. V., et al. (2018). Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations. Journal of Biomedical Informatics, 82, 189–199.
Dang, N. C., Moreno-García, M. N., & De la Prieta, F. (2020). Sentiment analysis based on deep learning: A comparative study. Electronics, 9(3), 483.
Diwali, A., Dashtipour, K., Saeedi, K., et al. (2022). Arabic sentiment analysis using dependency-based rules and deep neural networks. Applied Soft Computing, 127(109), 377.
Elghazaly, T., Mahmoud, A., & Hefny, H. A. (2016). Political sentiment analysis using twitter data. In Proceedings of the international conference on internet of things and cloud computing (pp. 1–5).
Gers, F. A., Schraudolph, N. N., & Schmidhuber, J. (2002). Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research, 3(Aug), 115–143.
Gowda, C., Anirudh, Pai, A., et al. (2019). Twitter and reddit sentimental analysis dataset. https://doi.org/10.34740/KAGGLE/DS/429085.
Gulati, K., Kumar, S. S., Boddu, R. S. K., et al. (2022). Comparative analysis of machine learning-based classification models using sentiment classification of tweets related to covid-19 pandemic. Materials Today: Proceedings, 51, 38–41.
Hidayat, T. H. J., Ruldeviyani, Y., Aditama, A. R., et al. (2022). Sentiment analysis of twitter data related to Rinca island development using doc2vec and svm and logistic regression as classifier. Procedia Computer Science, 197, 660–667.
Jiang, T., Gradus, J. L., & Rosellini, A. J. (2020). Supervised machine learning: A brief primer. Behavior Therapy, 51(5), 675–687.
Lee, V. L. S., Gan, K. H., Tan, T. P., et al. (2019). Semi-supervised learning for sentiment classification using small number of labeled data. Procedia Computer Science, 161, 577–584.
Pedregosa, F., Varoquaux, G., Gramfort, A., et al. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning research, 12, 2825–2830.
Punetha, N., & Jain, G. (2023). Bayesian game model based unsupervised sentiment analysis of product reviews. Expert Systems with Applications, 214(119), 128.
Ranjan, M. N. M., Ghorpade, Y., Kanthale, G., et al. (2017). Document classification using LSTM neural network. Journal of Data Mining and Management, 2(2), 1–9.
Shah, K., Patel, H., Sanghvi, D., et al. (2020). A comparative analysis of logistic regression, random forest and KNN models for the text classification. Augmented Human Research, 5(1), 1–16.
Shaik, T., Tao, X., Dann, C., et al. (2022). Sentiment analysis and opinion mining on educational data: A survey. Natural Language Processing Journal, 2, 100003.
Vashishtha, S., & Susan, S. (2019). Fuzzy rule based unsupervised sentiment analysis from social media posts. Expert Systems with Applications, 138(112), 834.
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998–6008).
Verma, S. (2022). Sentiment analysis of public services for smart society: Literature review and future research directions. Government Information Quarterly, 101708.
Yazdani, A., Safdari, R., Golkar, A., et al. (2019). Words prediction based on n-gram model for free-text entry in electronic health records. Health Information Science and Systems, 7(1), 1–7.
Ye, Q., Zhang, Z., & Law, R. (2009). Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Systems with Applications, 36(3), 6527–6535.
Zeiler, M. D., Krishnan, D., Taylor, G. W., et al. (2010). Deconvolutional networks. In 2010 IEEE computer society conference on computer vision and pattern recognition (pp. 2528–2535). IEEE.
Zhang, L., Wang, S., & Liu, B. (2018). Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1253.