Visual Sentiment Prediction with Attribute Augmentation and Multi-attention Mechanism

Springer Science and Business Media LLC - Tập 51 - Trang 2403-2416 - 2020
Zhuanghui Wu, Min Meng1, Jigang Wu1
1School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China

Tóm tắt

Recently, many methods that exploit attention mechanism to discover the relevant local regions via visual attributes, have demonstrated promising performance in visual sentiment prediction. In these methods, accurate detection of visual attributes is of vital importance to identify the sentiment relevant regions, which is crucial for successful assessment of visual sentiment. However, existing work merely utilize basic strategies on convolutional neural network for visual attribute detection and fail to obtain satisfactory results due to the semantic gap between visual features and subjective attributes. Moreover, it is difficult for existing attention models to localize subtle sentiment relevant regions, especially when the performance of attribute detection is relatively poor. To address these problems, we first design a multi-task learning based approach for visual attribute detection. By augmenting the attributes with sentiments supervision, the semantic gap can be effectively reduced. We then develop a multi-attention model for jointly discovering and localizing multiple relevant local regions given predicted attributes. The classifier built on top of these regions achieves a significant improvement in visual sentiment prediction. Experimental results demonstrate the superiority of our method against previous approaches.

Tài liệu tham khảo

Alameda-Pineda X, Ricci E, Yan Y, Sebe N (2016) Recognizing emotions from abstract paintings using non-linear matrix completion. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5240–5248 Borth D, Chen T, Ji R, Chang SF (2013) SentiBank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In: Proceedings of the 21st ACM international conference on multimedia, pp 459–460 Borth D, Ji R, Chen T, Breuel T, Chang SF (2013) Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on multimedia, pp 223–232 Campos V, Salvador A, Giro-i Nieto X, Jou B (2015) Diving deep into sentiment: understanding fine-tuned cnns for visual sentiment prediction. In: Proceedings of the 1st international workshop on affect sentiment in multimedia, pp 57–62 Campos V, Jou B, Giró-I-Nieto X (2017) From pixels to sentiment: fine-tuning cnns for visual sentiment prediction. Image Vis Comput 65:15–22 Chen T, Borth D, Darrell T, Chang S (2014) DeepSentiBank: visual sentiment concept classification with deep convolutional neural networks. CoRR arXiv:1410.8586 Chen YY, Chen T, Liu T, Liao HYM, Chang SF (2015) Assistive image comment robot—a novel mid-level concept-based representation. IEEE Trans Affect Comput 6(3):298–311 Einhauser W, Spain M, Perona P (2008) Objects predict fixations better than early saliency. J Vis 8(14):18.1–26 Escorcia V, Niebles JC, Ghanem B (2015) On the relationship between visual attributes and convolutional networks. IEEE conference on computer vision and pattern recognition, CVPR 2015, pp 1256–1264 Fan S, Ng T, Herberg JS, Koenig BL, Tan CYC, Wang R (2014) An automated estimator of image visual realism based on human cognition. In: 2014 IEEE conference on computer vision and pattern recognition, pp 4201–4208 Fan S, Jiang M, Shen Z, Koenig BL, Kankanhalli MS, Zhao Q (2017) The role of visual attention in sentiment prediction. In: Proceedings of the 2017 ACM on multimedia conference, MM 2017, pp 217–225 Gomes CFA, Brainerd CJ, Stein LM (2013) Effects of emotional valence and arousal on recollective and nonrecollective recall. J Exp Psychol Learn Mem Cognit 39(3):663–677 Gu X, Gu Y, Wu H (2017) Cascaded convolutional neural networks for aspect-based opinion summary. Neural Process Lett 46(2):1–14 Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 Joshi D, Datta R, Fedorovskaya E, Luong QT, Wang JZ, Jia L, Luo J (2011) Aesthetics and emotions in images. IEEE Signal Process Mag 28(5):94–115 Jou B, Chen T, Pappas N, Redi M, Topkara M, Chang S (2015) Visual affect around the world: a large-scale multilingual visual sentiment ontology. In: Proceedings of the 23rd annual ACM conference on multimedia conference, pp 159–168 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems—volume 1, Curran Associates Inc., pp 1097–1105 Lei P, Zhu S, Ngo CW (2015) Deep multimodal learning for affective analysis and retrieval. IEEE Trans Multimedia 17(11):2008–2020 Lin T, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: 13th European conference on computer vision ECCV 2014, pp 740–755 Lu X, Suryanarayan P, Adams RB Jr, Li J, Newman MG, Wang JZ (2012) On shape and the computability of emotions. In: Proceedings of the 20th ACM international conference on multimedia, MM ’12, pp 229–238 Ma L, Lu Z, Shang L, Li H (2015) Multimodal convolutional neural networks for matching image and sentence. IEEE Int Conf Comput Vis 2015:2623–2631 Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on multimedia, MM ’10, pp 83–92 Mnih V, Heess N, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. In: Annual conference on neural information processing systems 2014, pp 2204–2212 Peng K, Sadovnik A, Gallagher A, Chen T (2016) Where do emotions come from? Predicting the emotion stimuli map. In: 2016 IEEE international conference on image processing (ICIP), pp 614–618 Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543 Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Annual conference on neural information processing systems 2015, pp 91–99 Ulrike R, Lila D, Radoslav P, Sonya D, Phelps EA (2011) Emotion enhances the subjective feeling of remembering, despite lower accuracy for contextual details. Emotion 11(3):553–562 Wu L, Qi M, Jian M, Zhang H (2019) Visual sentiment analysis by combining global and local information. Neural Process Lett. https://doi.org/10.1007/s11063-019-10027-7 Xu K, Ba J, Kiros R, Cho K, Courville AC, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: Proceedings of the 32nd international conference on machine learning, ICML 2015, pp 2048–2057 Xun H, Shen C, Boix X, Qi Z (2015) SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: 2015 IEEE international conference on computer vision (ICCV) Yang J, She D, Lai YK, Rosin PL, Yang MH (2018) Weakly supervised coupled networks for visual sentiment analysis. In: CVPR You Q, Luo J, Jin H, Yang J (2015) Joint visual-textual sentiment analysis with deep neural networks. In: Proceedings of the 23rd ACM international conference on multimedia, MM ’15, pp 1071–1074 You Q, Luo J, Jin H, Yang J (2015) Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 381–388 You Q, Luo J, Jin H, Yang J (2016) Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In: Proceedings of the ninth ACM international conference on web search and data mining, pp 13–22 You Q, Jin H, Luo J (2017) Visual sentiment analysis by attending on local image regions. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 231–237 Yuan J, Mcdonough S, You Q, Luo J (2013) Sentribute: Image sentiment analysis from a mid-level perspective. In: Proceedings of the second international workshop on issues of sentiment discovery and opinion mining, WISDOM ’13, pp 10:1–10:8 Zhang S, Xu X, Pang Y, Han J (2019) Multi-layer attention based cnn for target-dependent sentiment classification. Neural Process Lett 3:1–15