Multi-Level Context Pyramid Network for Visual Sentiment Analysis

Sensors - Tập 21 Số 6 - Trang 2136
Haochun Ou1, Chunmei Qing1, Xiangmin Xu1, Jianxiu Jin1
1School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510640, China

Tóm tắt

Sharing our feelings through content with images and short videos is one main way of expression on social networks. Visual content can affect people’s emotions, which makes the task of analyzing the sentimental information of visual content more and more concerned. Most of the current methods focus on how to improve the local emotional representations to get better performance of sentiment analysis and ignore the problem of how to perceive objects of different scales and different emotional intensity in complex scenes. In this paper, based on the alterable scale and multi-level local regional emotional affinity analysis under the global perspective, we propose a multi-level context pyramid network (MCPNet) for visual sentiment analysis by combining local and global representations to improve the classification performance. Firstly, Resnet101 is employed as backbone to obtain multi-level emotional representation representing different degrees of semantic information and detailed information. Next, the multi-scale adaptive context modules (MACM) are proposed to learn the sentiment correlation degree of different regions for different scale in the image, and to extract the multi-scale context features for each level deep representation. Finally, different levels of context features are combined to obtain the multi-cue sentimental feature for image sentiment classification. Extensive experimental results on seven commonly used visual sentiment datasets illustrate that our method outperforms the state-of-the-art methods, especially the accuracy on the FI dataset exceeds 90%.

Từ khóa


Tài liệu tham khảo

Fan, S., Shen, Z., Jiang, M., Koenig, B.L., Xu, J., Kankanhalli, M.S., and Zhao, Q. (2018, January 18–22). Emotional attention: A study of image sentiment and visual attention. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.

Brosch, 2010, The perception and categorisation of emotional stimuli: A review, Cogn. Emot., 24, 377, 10.1080/02699930902975754

Ortis, 2020, Survey on visual sentiment analysis, IET Image Process., 14, 1440, 10.1049/iet-ipr.2019.1270

Rao, 2019, Multi-level region-based convolutional neural network for image emotion classification, Neurocomputing, 333, 429, 10.1016/j.neucom.2018.12.053

Joshi, 2011, Aesthetics and emotions in images, IEEE Signal Process. Mag., 28, 94, 10.1109/MSP.2011.941851

You, Q., Jin, H., and Luo, J. (2017, January 4–9). Visual sentiment analysis by attending on local image regions. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.

Fan, S., Jiang, M., Shen, Z., Koenig, B.L., Kankanhalli, M.S., and Zhao, Q. (2017, January 23–27). The role of visual attention in sentiment prediction. Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA.

Song, 2018, Boosting image sentiment analysis with visual attention, Neurocomputing, 312, 218, 10.1016/j.neucom.2018.05.104

Rao, T., Li, X., and Xu, M. (2019). Learning multi-level deep representations for image emotion classification. Neural Process. Lett., 1–19.

Chen, T., Yu, F.X., Chen, J., Cui, Y., Chen, Y.Y., and Chang, S.F. (2014, January 3–7). Object-based visual sentiment concept analysis and application. Proceedings of the 22nd ACM international conference on Multimedia, Orlando, FL, USA.

Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., and Ling, H. (February, January 27). M2det: A single-shot object detector based on multi-level feature pyramid network. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA.

He, J., Deng, Z., Zhou, L., Wang, Y., and Qiao, Y. (2019, January 16–20). Adaptive pyramid context network for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.

Surekha, 2020, Deep Neural Network-based human emotion recognition by computer vision, Advances in Electrical and Computer Technologies, Springer LNEE, 672, 453

Cerf, 2009, Faces and text attract gaze independent of the task: Experimental data and computer model, J. Vis., 9, 10, 10.1167/9.12.10

You, Q., Luo, J., Jin, H., and Yang, J. (2016, January 12–17). Building a large scale dataset for image emotion recognition: The fine print and the benchmark. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA.

Borth, J.R., Chen, T., Breuel, T., and Chang, S.F. (2013, January 21–25). Large-scale visual sentiment ontology and detectors using adjective noun pairs. Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, Spain.

Chen, T., Borth, D., Darrell, T., and Chang, S.F. (2014). DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks. arXiv.

Li, 2018, Image sentiment prediction based on textual descriptions with adjective noun pairs, Multimed. Tools Appl., 77, 1115, 10.1007/s11042-016-4310-5

Yuan, J., Mcdonough, S., You, Q., and Luo, J. (2013, January 11–14). Sentribute: Image sentiment analysis from a mid-level perspective. Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining, Chicago, IL, USA.

Yang, J., She, D., and Ming, S. (2017, January 19–25). Joint Image Emotion Classification and Distribution Learning via Deep Convolutional Neural Network. Proceedings of the Twenty-sixth International Joint Conference on Artificial Intelligence, Melbourne, Australia.

Kim, 2018, Building emotional machines: Recognizing image emotions through deep neural networks, IEEE Trans. Multimed., 20, 2980, 10.1109/TMM.2018.2827782

Ali, A.R., Shahid, U., Ali, M., and Ho, J. (2017, January 27–29). High-level concepts for affective understanding of images. Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA.

Peng, K.C., Sadovnik, A., Gallagher, A., and Chen, T. (2016, January 25–28). Where do emotions come from? predicting the emotion stimuli map. Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA.

Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. Object detectors emerge in deep scene cnns. International Conference on Learning Representations.

Yang, J., She, D., Lai, Y.K., Rosin, P.L., and Yang, M.H. (2018, January 18–22). Weakly supervised coupled networks for visual sentiment analysis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.

Yadav, 2020, A deep learning architecture of RA-DLNet for visual sentiment analysis, Multimed. Syst., 26, 431, 10.1007/s00530-020-00656-7

Wu, 2020, Visual Sentiment Analysis by Combining Global and Local Information, Neural Process. Lett., 51, 2063, 10.1007/s11063-019-10027-7

Yang, 2018, Visual sentiment prediction based on automatic discovery of affective regions, IEEE Trans. Multimed., 20, 2513, 10.1109/TMM.2018.2803520

Ren, 2017, Faster r-cnn: Towards real-time object detection with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell., 39, 1137, 10.1109/TPAMI.2016.2577031

Mikels, 2005, Emotional category data on images from the International Affective Picture System, Behav. Res. Methods, 37, 626, 10.3758/BF03192732

Machajdik, J., and Hanbury, A. (2010, January 25–29). Affective image classification using features inspired by psychology and art theory. Proceedings of the 18th ACM International Conference on Multimedia, Firenze, Italy.

You, Q., Luo, J., Jin, H., and Yang, J. (2015, January 25–30). Robust image sentiment analysis using progressively trained and domain transferred deep networks. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA.

Lang, 1997, International affective picture system (IAPS): Technical manual and affective ratings, NIMH Cent. Study Emot. Atten., 1, 39

Peng, K.C., Chen, T., Sadovnik, A., and Gallagher, A. (2015, January 8–12). A mixed bag of emotions: Model, predict, and transfer emotion distributions. Proceedings of the Computer Vision & Pattern Recognition IEEE, Boston, MA, USA.

Panda, R., Zhang, J., Li, H., Lee, J.Y., Lu, X., and Roy-Chowdhury, A.K. (2018, January 8–14). Contemplating visual emotions: Understanding and overcoming dataset bias. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.

Ekman, P., Friesen, W.V., and Ellsworth, P. (1982). What emotion categories or dimensions can observers judge from facial behavior?. Emot. Hum. Face, 39–55.

He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27–30). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.

Siersdorfer, S., Minack, E., Deng, F., and Hare, J. (2010, January 25–29). Analyzing and predicting sentiment of images on the social web. Proceedings of the 18th ACM international conference on Multimedia, Firenze, Italy.

Zhao, S., Gao, Y., Jiang, X., Yao, H., Chua, T.S., and Sun, X. (2014, January 3–7). Exploring principles-of-art features for image emotion recognition. Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA.

Rao, T., Xu, M., Liu, H., Wang, J., and Burnett, I. (2016, January 25–28). Multi-scale blocks based image emotion classification using multiple instance learning. Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP). IEEE, Phoenix, AZ, USA.

Krizhevsky, 2017, Imagenet classification with deep convolutional neural networks, Commun. ACM, 60, 84, 10.1145/3065386

Simonyan, K., and Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv.

Zhu, X., Li, L., Zhang, W., Rao, T., Xu, M., Huang, Q., and Xu, D. (2017, January 19–25). Dependency exploitation: A unified CNN-RNN approach for visual emotion recognition. Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia.

(2016, July 25). COCO: Common Objects in Context. Available online: http://mscoco.org/dataset/#detections-leaderboard.

Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. (2016, January 27–30). Learning deep features for discriminative localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.