Learning multi-level representations for affective image recognition

Neural Computing and Applications - Tập 34 - Trang 14107-14120 - 2022

Hao Zhang¹, Dan Xu¹, Gaifang Luo², Kangjian He¹

¹School of Information Science and Engineering, Yunnan University, Kunming, China

²School of Software, Shanxi Agricultural University, Jinzhong, China

Tóm tắt

Images can convey intense affective experiences and affect people on an affective level. With the prevalence of online pictures and videos, evaluating emotions from visual content has attracted considerable attention. Affective image recognition aims to classify the emotions conveyed by digital images automatically. The existing studies using manual features or deep networks mainly focus on low-level visual features or high-level semantic representation without considering all factors. To better understand how deep networks are working for affective recognition tasks, we investigate the convolutional features by visualization them in this work. Our research shows that the hierarchical CNN model mainly relies on deep semantic information while ignoring the shallow visual details, which are essential to evoke emotions. To form a more general and discriminative representation, we propose a multi-level hybrid model that learns and integrates the deep semantics and shallow visual representations for sentiment classification. In addition, this study shows that class imbalance would affect performance as the main category of the affective dataset will overwhelm training and degenerate the deep networks. Therefore, a new loss function is introduced to optimize the deep affective model. Experimental results on several affective image recognition datasets show that our model outperforms various existing studies. The source code is publicly available.

Tài liệu tham khảo

Zhao S, Ding G, Huang Q, et al (2018) Affective image content analysis: a comprehensive survey[C]//IJCAI. pp 5534–5541

Hariri W, Farah N (2021) Recognition of 3D emotional facial expression based on handcrafted and deep feature combination. Pattern Recogn Lett 148:84–91

Hariri W, Farah N, Vishwakarma DK (2021) Deep and shallow covariance feature quantization for 3D facial expression recognition. arXiv preprint https://arxiv.org/abs/2105.05708

Wang J, Han Z (2019) Research on speech emotion recognition technology based on deep and shallow neural network. In: 2019 Chinese Control Conference (CCC). IEEE. pp 3555–3558

Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on Multimedia. pp 83–92

Alameda-Pineda X, Ricci E, Yan Y, et al (2016) Recognizing emotions from abstract paintings using non-linear matrix completion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 5240–5248

Yanulevskaya V, van Gemert J C, Roth K, et al. Emotional valence categorization using holistic image features[C]//2008 15th IEEE international conference on Image Processing. IEEE, 2008: 101–104

Zhao S (2016) Image emotion computing. In: Proceedings of the 24th ACM international conference on Multimedia. pp 1435–1439

Borth D, Ji R, Chen T, et al (2013) Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on Multimedia. pp 223–232

Yuan J, Mcdonough S, You Q, et al (2013) Sentribute: image sentiment analysis from a mid-level perspective. In: Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining. pp 1–8

Rao T, Xu M, Liu H, et al (2016) Multi-scale blocks based image emotion classification using multiple instance learning. In: 2016 IEEE International Conference on Image Processing (ICIP). IEEE. pp 634–638

Kim I, Baek W, Kim S (2020) Spatially attentive output layer for image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 9533–9542

Zoran D, Chrzanowski M, Huang PS, et al (2020) Towards robust image classification using sequential attention models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 9483–9492

He K, Gkioxari G, Dollár P, et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp 2961–2969

Joseph KJ, Khan S, Khan FS, et al (2021) Towards open world object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 5830–5840

Liu C, Chen L C, Schroff F, et al (2019) Auto-deeplab: hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 82–92.

Fan M, Lai S, Huang J, et al (2021) Rethinking BiSeNet For Real-time Semantic Segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 9716–9725

Mahendran A, Vedaldi A (2015) Understanding deep image representations by inverting them. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 5188–5196

Campos V, Salvador A, Giró-i-Nieto X, et al (2015) Diving deep into sentiment: understanding fine-tuned CNNs for visual sentiment prediction. In: Proceedings of the 1st International Workshop on Affect & Sentiment in Multimedia. pp 57–62

Zhang H, Xu D (2019) Ethnic painting analysis based on deep learning. Sci Sin Inf 49(2):204–215

Valdez P, Mehrabian A (1994) Effects of color on emotions. J Exp Psychol Gen 123(4):394

Yadav A, Vishwakarma DK (2020) Sentiment analysis using deep learning architectures: a review. Artif Intell Rev 53(6):4335–4385

Borth D, Chen T, Ji R, et al (2013) Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content[C]//Proceedings of the 21st ACM international conference on Multimedia. pp 459–460

Ali AR, Shahid U, Ali M, et al (2017) High-level concepts for affective understanding of images. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2017. pp 679–687

Rao T, Li X, Xu M (2020) Learning multi-level deep representations for image emotion classification. Neural Process Lett 51(3):2043–2061

Zhang W, He X, Lu W (2020) Exploring discriminative representations for image emotion recognition with CNNs. IEEE Trans Multimedia 22(2):515–523

Peng KC, Chen T, Sadovnik A, et al (2015) A mixed bag of emotions: model, predict, and transfer emotion distributions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 860–868

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

Deng J, Dong W, Socher R, et al (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE. pp 248–255

Zhu X, Li L, Zhang W, et al (2017) Dependency exploitation: A unified CNN-RNN approach for visual emotion recognition. In: proceedings of the 26th international joint conference on artificial intelligence. pp 3595–3601

Joshi D, Datta R, Fedorovskaya E (2011) Aesthetics and emotions in images. IEEE Signal Process Mag 28:94–115

Xiong H, Liu H, Zhong B et al (2019) Structured and sparse annotations for image emotion distribution learning. Proc AAAI Conf Artif Intell 33(01):363–370

Fan S, Shen Z, Jiang M, et al (2018) Emotional attention: a study of image sentiment and visual attention. In: Proceedings of the IEEE Conference on computer vision and pattern recognition. pp 7521–7531

Tajbakhsh N, Shin JY, Gurudu SR et al (2016) Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans Med Imaging 35(5):1299–1312

Jung H, Lee S, Yim J, et al (2015) Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the IEEE international conference on computer vision. pp 2983–2991

He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778

Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, Cham. pp 818–833

Elad M, Milanfar P (2017) Style transfer via texture synthesis. IEEE Trans Image Process 26(5):2338–2351

Gatys L, Ecker AS, Bethge M (2015) Texture synthesis using convolutional neural networks. Adv Neural Inf Process Syst 28:262–270

Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2414–2423

You Q, Luo J, Jin H, et al (2015) Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the AAAI conference on Artificial Intelligence. 29(1)

You Q, Luo J, Jin H, et al (2016) Building a large scale dataset for image emotion recognition: the fine print and the benchmark. In: Proceedings of the AAAI conference on artificial intelligence. 30(1)

Lin TY, Goyal P, Girshick R, et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. pp 2980–2988

Cui Y, Jia M, Lin T Y, et al (2019) Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 9268–9277

Mikels JA, Fredrickson BL, Larkin GR et al (2005) Emotional category data on images from the International Affective Picture System. Behav Res Methods 37(4):626–630

Chen T, Li M, Li Y, et al (2015) Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint https://arxiv.org/abs/1512.01274

Khirirat S, Feyzmahdavian HR, Johansson M (2017) Mini-batch gradient descent: faster convergence under data sparsity. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE. pp 2880–2887

Chan LKC, Jegadeesh N, Lakonishok J (1996) Momentum strategies. J Financ 51(5):1681–1713

Szegedy C, Vanhoucke V, Ioffe S, et al (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2818–2826

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint https://arxiv.org/abs/1409.1556

Huang G, Liu Z, Van Der Maaten L, et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4700–4708

Yang J, She D, Sun M et al (2018) Visual sentiment prediction based on automatic discovery of affective regions. IEEE Trans Multimedia 20(9):2513–2525

Zhao S, Gao Y, Jiang X, et al (2014) Exploring principles-of-art features for image emotion recognition. In: Proceedings of the 22nd ACM international conference on Multimedia. pp 47–56

Chen T, Borth D, Darrell T, et al (2014) Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks. arXiv preprint https://arxiv.org/abs/1410.8586

Xiong H, Liu Q, Song S et al (2019) Region-based convolutional neural network using group sparse regularization for image sentiment classification. EURASIP J Image Video Process 2019(1):1–9

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA