A novel facial emotion recognition model using segmentation VGG-19 architecture

International Journal of Information Technology - Tập 15 - Trang 1777-1787 - 2023
S. Vignesh1, M. Savithadevi2, M. Sridevi2, Rajeswari Sridhar2
1Department of Electrical and Electronics Engineering, National Insistute of Technology, Tiruchirappalli, India
2Department of Computer Science and Engineering, National Insistute of Technology, Tiruchirappalli, India

Tóm tắt

Facial Emotion Recognition (FER) has gained popularity in recent years due to its many applications, including biometrics, detection of mental illness, understanding of human behavior, and psychological profiling. However, developing an accurate and robust FER pipeline is still challenging because multiple factors make it difficult to generalize across different emotions. The factors that challenge a promising FER pipeline include pose variation, heterogeneity of the facial structure, illumination, occlusion, low resolution, and aging factors. Many approaches were developed to overcome the above problems, such as the Histogram of Oriented Gradients (HOG) and Local Binary Pattern (LBP) histogram. However, these methods require manual feature selection. Convolutional Neural Networks (CNN) overcame this manual feature selection problem. CNN has shown great potential in FER tasks due to its unique feature extraction strategy compared to regular FER models. In this paper, we propose a novel CNN architecture by interfacing U-Net segmentation layers in-between Visual Geometry Group (VGG) layers to allow the network to emphasize more critical features from the feature map, which also controls the flow of redundant information through the VGG layers. Our model achieves state-of-the-art (SOTA) single network accuracy compared with other well-known FER models on the FER-2013 dataset.

Tài liệu tham khảo

Ekman P (1973) Universal facial expressions in emotion. Studia Psychologica 15(2):140–147. https://www.paulekman.com/wp-content/uploads/2013/07/Universal-Facial-Expressions-of-Emotions1.pdf Ekman P, Friesen WV (1971) Constants across cultures in the face and emotion. J Personal Soc Psychol 17(2):124. https://doi.org/10.1037/h0030377 Ekman P, Friesen WV (1978) Facial action coding system. Environ Psychol Nonverbal Behav. https://doi.org/10.1037/t27734-000 Saraswat M, Chakraverty S, Kala A (2020) Analyzing emotion based movie recommender system using fuzzy emotion features. Int J Inf Technol 12(2):467–472. https://doi.org/10.1007/s41870-020-00431-x Kołakowska A, Landowska A, Szwoch M, Szwoch W, Wrobel MR (2014) Emotion recognition and its applications. In: Human-computer systems interaction: backgrounds and applications, vol 3. Springer, pp 51–62. https://doi.org/10.1007/978-3-319-08491-6_5 Deng J, Guo J, Ververas E, Kotsia I, Zafeiriou S (2020) Retinaface: single-shot multi-level face localisation in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5203–5212. https://doi.org/10.1109/CVPR42600.2020.00525 Babiloni F, Marras I, Kokkinos F, Deng J, Chrysos G, Zafeiriou S (2021) Poly-nl: linear complexity non-local layers with 3rd order polynomials. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10518–10528. https://doi.org/10.1109/ICCV48922.2021.01035 Balayesu N, Kalluri HK (2020) An extensive survey on traditional and deep learning-based face sketch synthesis models. Int J Inf Technol 12(3):995–1004. https://doi.org/10.1007/s41870-019-00386-8 Rahman A, Beg MMS (2019) Face sketch recognition: an application of z-numbers. Int J Inf Technol 11(3):541–548. https://doi.org/10.1007/s41870-018-0178-0 Kumar D et al (2017) Feature selection for face recognition using dct-pca and bat algorithm. Int J Inf Technol 9(4):411–423. https://doi.org/10.1007/s41870-017-0051-6 Chrysos GG, Moschoglou S, Bouritsas G, Deng J, Panagakis Y, Zafeiriou S (2021) Deep polynomial neural networks. IEEE Trans Pattern Anal Mach Intell 44(8):4021–4034. https://doi.org/10.1109/TPAMI.2021.3058891 Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823. https://doi.org/10.1109/CVPR.2015.7298682 Liu S, Li D, Gao Q, Song Y (2020) Facial emotion recognition based on cnn. In: 2020 Chinese Automation Congress (CAC), pp 398–403. https://doi.org/10.1109/CAC51589.2020.9327432 Pramerdorfer C, Kampel M (2016) Facial expression recognition using convolutional neural networks: state of the art. Preprint at arXiv:1612.02903. https://doi.org/10.48550/arXiv.1612.02903 Minaee S, Minaei M, Abdolrashidi A (2021) Deep-emotion: facial expression recognition using attentional convolutional network. Sensors 21(9):3046. https://doi.org/10.3390/s21093046 Xu L, Fei M, Zhou W, Yang A (2018) Face expression recognition based on convolutional neural network. In: 2018 Australian & New Zealand Control Conference (ANZCC). IEEE, pp 115–118. https://doi.org/10.1109/ANZCC.2018.8606597 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Preprint at arXiv:1409.1556. https://doi.org/10.48550/arXiv.1409.1556 Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241. https://doi.org/10.48550/arXiv.1505.04597 Minaee S, Boykov YY, Porikli F, Plaza AJ, Kehtarnavaz N, Terzopoulos D (2021) Image segmentation using deep learning: a survey. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3059968 Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee D-H et al (2013) Challenges in representation learning: a report on three machine learning contests. In: International conference on neural information processing. Springer, pp 117–124. https://doi.org/10.48550/arXiv.1307.0414 Song M, Tao D, Liu Z, Li X, Zhou M (2009) Image ratio features for facial expression recognition application. IEEE Trans Syst Man Cybern Part B (Cybern) 40(3):779–788. https://doi.org/10.1109/TSMCB.2009.2029076 Dahmane M, Meunier J (2014) Prototype-based modeling for facial expression analysis. IEEE Trans Multimed 16(6):1574–1584. https://doi.org/10.1109/TMM.2014.2321113 Siddiqi MH, Ali R, Sattar A, Khan AM, Lee S (2014) Depth camera-based facial expression recognition system using multilayer scheme. IETE Tech Rev 31(4):277–286. https://doi.org/10.1080/02564602.2014.944588 Siddiqi MH, Ali R, Khan AM, Park Y-T, Lee S (2015) Human facial expression recognition using stepwise linear discriminant analysis and hidden conditional random fields. IEEE Trans Image Process 24(4):1386–1398. https://doi.org/10.1109/TIP.2015.2405346 Kim J-H, Kim B-G, Roy PP, Jeong D-M (2019) Efficient facial expression recognition algorithm based on hierarchical deep neural network structure. IEEE Access 7:41273–41285. https://doi.org/10.1109/ACCESS.2019.2907327 Zhang H (2020) Expression-eeg based collaborative multimodal emotion recognition using deep autoencoder. IEEE Access 8:164130–164143. https://doi.org/10.1109/ACCESS.2020.3021994 Cimtay Y, Ekmekcioglu E, Caglar-Ozhan S (2020) Cross-subject multimodal emotion recognition based on hybrid fusion. IEEE Access 8:168865–168878. https://doi.org/10.1109/ACCESS.2020.3023871 Qi C, Li M, Wang Q, Zhang H, Xing J, Gao Z, Zhang H (2018) Facial expressions recognition based on cognition and mapped binary patterns. IEEE Access 6:18795–18803. https://doi.org/10.1109/ACCESS.2018.2816044 Zhang F, Zhang T, Mao Q, Xu C (2020) Geometry guided pose-invariant facial expression recognition. IEEE Trans Image Process 29:4445–4460. https://doi.org/10.1109/TIP.2020.2972114 Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386 Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848 Khaireddin Y, Chen Z (2021) Facial emotion recognition: state of the art performance on fer2013. Preprint at arXiv:2105.03588. https://doi.org/10.48550/arXiv.2105.03588 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778. https://doi.org/10.1109/CVPR.2016.90 Khorrami P, Paine T, Huang T (2015) Do deep neural networks learn facial action units when doing expression recognition? In: Proceedings of the IEEE international conference on computer vision workshops, pp 19–27. https://doi.org/10.1109/ICCVW.2015.12 Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE computer society conference on computer vision and pattern recognition-workshops. IEEE, 2010, pp 94–101. https://doi.org/10.1109/CVPRW.2010.5543262 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594 Meng Z, Liu P, Cai J, Han S, Tong Y (2017) Identity-aware convolutional neural network for facial expression recognition. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, 2017, pp 558–565. https://doi.org/10.1109/FG.2017.140 Shan K, Guo J, You W, Lu D, Bie R (2017) Automatic facial expression recognition based on a deep convolutional-neural-network structure. In: 2017 IEEE 15th international conference on software engineering research, management and applications (SERA). IEEE, 2017, pp 123–128. https://doi.org/10.1109/SERA.2017.7965717 Georgescu M-I, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836. https://doi.org/10.1109/ACCESS.2019.2917266 Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141. https://doi.org/10.1109/CVPR.2018.00745 Gao Z, Xie J, Wang Q, Li P (2019) Global second-order pooling convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3024–3033. https://doi.org/10.1109/CVPR.2019.00314 Yang Z, Zhu L, Wu Y, Yang Y (2020) Gated channel transformation for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11794–11803. https://doi.org/10.1109/CVPR42600.2020.01181 Nie X, Ding H, Qi M, Wang Y, Wong EK (2021) Urca-gan: Upsample residual channel-wise attention generative adversarial network for image-to-image translation. Neurocomputing 443:75–84. https://doi.org/10.1016/j.neucom.2021.02.054 Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11531–11539. https://doi.org/10.1109/CVPR42600.2020.01155 Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659–5667. https://doi.org/10.1109/CVPR.2017.667 Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), September 2018. https://doi.org/10.1007/978-3-030-01234-2_1 Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7151–7160. https://doi.org/10.1109/CVPR.2018.00747 Lee H, Kim H-E, Nam H (2019) Srm: a style-based recalibration module for convolutional neural networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1854–1862. https://doi.org/10.1109/ICCV.2019.00194 Diba A, Fayyaz M, Sharma V, Arzani MM, Yousefzadeh R, Gall J, Van Gool L (2018) Spatio-temporal channel correlation networks for action classification. In: Proceedings of the European conference on computer vision (ECCV), pp 284–299. https://doi.org/10.1007/978-3-030-01225-0_18 Pecoraro R, Basile V, Bono V (2022) Local multi-head channel self-attention for facial expression recognition. Information 13(9):419. https://doi.org/10.3390/info13090419 Qin Z, Zhang P, Wu F, Li X (2021) Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 783–792. https://doi.org/10.1109/ICCV48922.2021.00082 Liu K, Zhang M, Pan Z (2016) Facial expression recognition with cnn ensemble. In: 2016 international conference on cyberworlds (CW). IEEE, 2016, pp 163–166. https://doi.org/10.1109/CW.2016.34 Giannopoulos P, Perikos I, Hatzilygeroudis I (2018) Deep learning approaches for facial emotion recognition: a case study on fer-2013. In: Advances in hybridization of intelligent methods. Springer, pp 1–16. https://doi.org/10.1007/978-3-319-66790-4_1 Fard AP, Mahoor MH (2022) Ad-corre: adaptive correlation-based loss for facial expression recognition in the wild. IEEE Access 10:26756–26768. https://doi.org/10.1109/ACCESS.2022.3156598 Khanzada A, Bai C, Celepcikay FT (2020) Facial expression recognition with deep learning. Preprint at arXiv:2004.11823. https://doi.org/10.48550/arXiv.2004.11823 Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In: European conference on information retrieval. Springer, pp 345–359. https://doi.org/10.1007/978-3-540-31865-1_25 Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process 5(2):1. https://doi.org/10.5121/ijdkp.2015.5201