Visual sentiment analysis with semantic correlation enhancement

Complex & Intelligent Systems - Trang 1-13 - 2023
Hao Zhang1, Yanan Liu1, Zhaoyu Xiong1, Zhichao Wu2, Dan Xu1
1School of Information Science and Engineering, Yunnan University, Kunming, China
2School of Artificial Intelligence, Beijing Normal University, Beijing, China

Tóm tắt

Visual sentiment analysis is in great demand as it provides a computational method to recognize sentiment information in abundant visual contents from social media sites. Most of existing methods use CNNs to extract varying visual attributes for image sentiment prediction, but they failed to comprehensively consider the correlation among visual components, and are limited by the receptive field of convolutional layers as a result. In this work, we propose a visual semantic correlation network VSCNet, a Transformer-based visual sentiment prediction model. Precisely, global visual features are captured through an extended attention network stacked by a well-designed extended attention mechanism like Transformer. An off-the-shelf object query tool is used to determine the local candidates of potential affective regions, by which redundant and noisy visual proposals are filtered out. All candidates considered affective are embedded into a computable semantic space. Finally, a fusion strategy integrates semantic representations and visual features for sentiment analysis. Extensive experiments reveal that our method outperforms previous studies on 5 annotated public image sentiment datasets without any training tricks. More specifically, it achieves 1.8% higher accuracy on FI benchmark compared with other state-of-the-art methods.

Tài liệu tham khảo

Borth D, Chen T, Ji R, Chang SF (2013) Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In: Proceedings of the 21st ACM international conference on multimedia, association for computing machinery, New York, NY, USA. pp 459-460. https://doi.org/10.1145/2502081.2502268

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2020) An image is worth \(16\times 16\) words: transformers for image recognition at scale. CoRR abs/2010.11929. arXiv:2010.11929

He X, Zhang H, Li N, Feng L, Zheng F (2019) A multi-attentive pyramidal model for visual sentiment analysis. In: 2019 international joint conference on neural networks (IJCNN). pp 1–8. https://doi.org/10.1109/IJCNN.2019.8852317

Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2022) Transformers in vision: a survey. ACM Comput Surv. https://doi.org/10.1145/3505244

Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60:84–90. https://doi.org/10.1145/3065386

Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on multimedia, association for computing machinery, New York, NY, USA. pp 83-92. https://doi.org/10.1145/1873951.1873965

Mikels J, Fredrickson B, Samanez-Larkin G, Lindberg C, Maglio S, Reuter-Lorenz P (2005) Emotional category data on images from the international affective picture system. Behav Res Methods 37:626–30. https://doi.org/10.3758/BF03192732

Ou H, Qing C, Xu X, Jin J (2021) Multi-level context pyramid network for visual sentiment analysis. Sensors 21. https://www.mdpi.com/1424-8220/21/6/2136. https://doi.org/10.3390/s21062136

Rao T, Li X, Zhang H, Xu M (2019) Multi-level region-based convolutional neural network for image emotion classification. Neurocomputing 333:429–439. https://doi.org/10.1016/j.neucom.2018.12.053

She D, Sun M, Yang J (2019) Learning discriminative sentiment representation from strongly- and weakly supervised CNNs. ACM Trans Multimedia Comput Commun Appl. https://doi.org/10.1145/3326335

She D, Yang J, Cheng MM, Lai YK, Rosin PL, Wang L (2020) Wscnet: weakly supervised coupled networks for visual sentiment classification and detection. IEEE Trans Multimedia 22:1358–1371. https://doi.org/10.1109/TMM.2019.2939744

Srinivas A., Lin TY, Parmar N, Shlens J, Abbeel P, Vaswani A (2021) Bottleneck transformers for visual recognition. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp 16514–16524. https://doi.org/10.1109/CVPR46437.2021.01625

Wu YH, Liu Y, Zhan X, Cheng MM (2022) P2t: pyramid pooling transformer for scene understanding. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3202765

Yamamoto T, Takeuchi S, Nakazawa A (2021) Image emotion recognition using visual and semantic features reflecting emotional and similar objects. IEICE Trans Inf Syst 104:1691–1701. https://doi.org/10.1587/transinf.2020EDP7218

Yang J, Li J, Wang X, Ding Y, Gao X (2021) Stimuli-aware visual emotion analysis. IEEE Trans Image Process 30:7432–7445. https://doi.org/10.1109/TIP.2021.3106813. arXiv:2109.01812

Yang J, She D, Sun M (2017) Joint image emotion classification and distribution learning via deep convolutional neural network. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI-17, pp 3266–3272. https://doi.org/10.24963/ijcai.2017/456

Yang J, She D, Sun M, Cheng MM, Rosin PL, Wang L (2018) Visual sentiment prediction based on automatic discovery of affective regions. IEEE Trans Multimedia 20:2513–2525. https://doi.org/10.1109/TMM.2018.2803520

Yanulevskaya V, van Gemert J, Roth K, Herbold A, Sebe N, Geusebroek J (2008) Emotional valence categorization using holistic image features. In: 2008 15th IEEE international conference on image processing. pp 101–104. https://doi.org/10.1109/ICIP.2008.4711701

Zhang W, He X, Lu W (2020) Exploring discriminative representations for image emotion recognition with CNNs. IEEE Trans Multimedia 22:515–523. https://doi.org/10.1109/TMM.2019.2928998

Zhao S (2016) Image emotion computing. In: Proceedings of the 24th ACM international conference on multimedia, association for computing machinery, New York, NY, USA. pp 1435–1439. https://doi.org/10.1145/2964284.2971473

Zhao S, Gao Y, Jiang X, Yao H, Chua TS, Sun X (2014) Exploring principles-of-art features for image emotion recognition. In: Proceedings of the 22nd ACM international conference on multimedia, association for computing machinery, New York, NY, USA. pp 47–56. https://doi.org/10.1145/2647868.2654930

Zhao S, Jia Z, Chen H, Li L, Ding G, Keutzer K (2019) Pdanet: polarity-consistent deep attention network for fine-grained visual emotion regression. In: Proceedings of the 27th ACM international conference on multimedia, association for computing machinery, New York, NY, USA. pp 192–201. https://doi.org/10.1145/3343031.3351062

Zhao S, Yao X, Yang J, Jia G, Ding G, Chua TS, Schuller BW, Keutzer K (2021) Affective image content analysis: two decades review and new perspectives. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3094362