A novel facial emotion recognition model using segmentation VGG-19 architecture
Tóm tắt
Facial Emotion Recognition (FER) has gained popularity in recent years due to its many applications, including biometrics, detection of mental illness, understanding of human behavior, and psychological profiling. However, developing an accurate and robust FER pipeline is still challenging because multiple factors make it difficult to generalize across different emotions. The factors that challenge a promising FER pipeline include pose variation, heterogeneity of the facial structure, illumination, occlusion, low resolution, and aging factors. Many approaches were developed to overcome the above problems, such as the Histogram of Oriented Gradients (HOG) and Local Binary Pattern (LBP) histogram. However, these methods require manual feature selection. Convolutional Neural Networks (CNN) overcame this manual feature selection problem. CNN has shown great potential in FER tasks due to its unique feature extraction strategy compared to regular FER models. In this paper, we propose a novel CNN architecture by interfacing U-Net segmentation layers in-between Visual Geometry Group (VGG) layers to allow the network to emphasize more critical features from the feature map, which also controls the flow of redundant information through the VGG layers. Our model achieves state-of-the-art (SOTA) single network accuracy compared with other well-known FER models on the FER-2013 dataset.
Tài liệu tham khảo
Ekman P (1973) Universal facial expressions in emotion. Studia Psychologica 15(2):140–147. https://www.paulekman.com/wp-content/uploads/2013/07/Universal-Facial-Expressions-of-Emotions1.pdf
Ekman P, Friesen WV (1971) Constants across cultures in the face and emotion. J Personal Soc Psychol 17(2):124. https://doi.org/10.1037/h0030377
Ekman P, Friesen WV (1978) Facial action coding system. Environ Psychol Nonverbal Behav. https://doi.org/10.1037/t27734-000
Saraswat M, Chakraverty S, Kala A (2020) Analyzing emotion based movie recommender system using fuzzy emotion features. Int J Inf Technol 12(2):467–472. https://doi.org/10.1007/s41870-020-00431-x
Kołakowska A, Landowska A, Szwoch M, Szwoch W, Wrobel MR (2014) Emotion recognition and its applications. In: Human-computer systems interaction: backgrounds and applications, vol 3. Springer, pp 51–62. https://doi.org/10.1007/978-3-319-08491-6_5
Deng J, Guo J, Ververas E, Kotsia I, Zafeiriou S (2020) Retinaface: single-shot multi-level face localisation in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5203–5212. https://doi.org/10.1109/CVPR42600.2020.00525
Babiloni F, Marras I, Kokkinos F, Deng J, Chrysos G, Zafeiriou S (2021) Poly-nl: linear complexity non-local layers with 3rd order polynomials. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10518–10528. https://doi.org/10.1109/ICCV48922.2021.01035
Balayesu N, Kalluri HK (2020) An extensive survey on traditional and deep learning-based face sketch synthesis models. Int J Inf Technol 12(3):995–1004. https://doi.org/10.1007/s41870-019-00386-8
Rahman A, Beg MMS (2019) Face sketch recognition: an application of z-numbers. Int J Inf Technol 11(3):541–548. https://doi.org/10.1007/s41870-018-0178-0
Kumar D et al (2017) Feature selection for face recognition using dct-pca and bat algorithm. Int J Inf Technol 9(4):411–423. https://doi.org/10.1007/s41870-017-0051-6
Chrysos GG, Moschoglou S, Bouritsas G, Deng J, Panagakis Y, Zafeiriou S (2021) Deep polynomial neural networks. IEEE Trans Pattern Anal Mach Intell 44(8):4021–4034. https://doi.org/10.1109/TPAMI.2021.3058891
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823. https://doi.org/10.1109/CVPR.2015.7298682
Liu S, Li D, Gao Q, Song Y (2020) Facial emotion recognition based on cnn. In: 2020 Chinese Automation Congress (CAC), pp 398–403. https://doi.org/10.1109/CAC51589.2020.9327432
Pramerdorfer C, Kampel M (2016) Facial expression recognition using convolutional neural networks: state of the art. Preprint at arXiv:1612.02903. https://doi.org/10.48550/arXiv.1612.02903
Minaee S, Minaei M, Abdolrashidi A (2021) Deep-emotion: facial expression recognition using attentional convolutional network. Sensors 21(9):3046. https://doi.org/10.3390/s21093046
Xu L, Fei M, Zhou W, Yang A (2018) Face expression recognition based on convolutional neural network. In: 2018 Australian & New Zealand Control Conference (ANZCC). IEEE, pp 115–118. https://doi.org/10.1109/ANZCC.2018.8606597
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Preprint at arXiv:1409.1556. https://doi.org/10.48550/arXiv.1409.1556
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241. https://doi.org/10.48550/arXiv.1505.04597
Minaee S, Boykov YY, Porikli F, Plaza AJ, Kehtarnavaz N, Terzopoulos D (2021) Image segmentation using deep learning: a survey. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3059968
Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee D-H et al (2013) Challenges in representation learning: a report on three machine learning contests. In: International conference on neural information processing. Springer, pp 117–124. https://doi.org/10.48550/arXiv.1307.0414
Song M, Tao D, Liu Z, Li X, Zhou M (2009) Image ratio features for facial expression recognition application. IEEE Trans Syst Man Cybern Part B (Cybern) 40(3):779–788. https://doi.org/10.1109/TSMCB.2009.2029076
Dahmane M, Meunier J (2014) Prototype-based modeling for facial expression analysis. IEEE Trans Multimed 16(6):1574–1584. https://doi.org/10.1109/TMM.2014.2321113
Siddiqi MH, Ali R, Sattar A, Khan AM, Lee S (2014) Depth camera-based facial expression recognition system using multilayer scheme. IETE Tech Rev 31(4):277–286. https://doi.org/10.1080/02564602.2014.944588
Siddiqi MH, Ali R, Khan AM, Park Y-T, Lee S (2015) Human facial expression recognition using stepwise linear discriminant analysis and hidden conditional random fields. IEEE Trans Image Process 24(4):1386–1398. https://doi.org/10.1109/TIP.2015.2405346
Kim J-H, Kim B-G, Roy PP, Jeong D-M (2019) Efficient facial expression recognition algorithm based on hierarchical deep neural network structure. IEEE Access 7:41273–41285. https://doi.org/10.1109/ACCESS.2019.2907327
Zhang H (2020) Expression-eeg based collaborative multimodal emotion recognition using deep autoencoder. IEEE Access 8:164130–164143. https://doi.org/10.1109/ACCESS.2020.3021994
Cimtay Y, Ekmekcioglu E, Caglar-Ozhan S (2020) Cross-subject multimodal emotion recognition based on hybrid fusion. IEEE Access 8:168865–168878. https://doi.org/10.1109/ACCESS.2020.3023871
Qi C, Li M, Wang Q, Zhang H, Xing J, Gao Z, Zhang H (2018) Facial expressions recognition based on cognition and mapped binary patterns. IEEE Access 6:18795–18803. https://doi.org/10.1109/ACCESS.2018.2816044
Zhang F, Zhang T, Mao Q, Xu C (2020) Geometry guided pose-invariant facial expression recognition. IEEE Trans Image Process 29:4445–4460. https://doi.org/10.1109/TIP.2020.2972114
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Khaireddin Y, Chen Z (2021) Facial emotion recognition: state of the art performance on fer2013. Preprint at arXiv:2105.03588. https://doi.org/10.48550/arXiv.2105.03588
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Khorrami P, Paine T, Huang T (2015) Do deep neural networks learn facial action units when doing expression recognition? In: Proceedings of the IEEE international conference on computer vision workshops, pp 19–27. https://doi.org/10.1109/ICCVW.2015.12
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE computer society conference on computer vision and pattern recognition-workshops. IEEE, 2010, pp 94–101. https://doi.org/10.1109/CVPRW.2010.5543262
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Meng Z, Liu P, Cai J, Han S, Tong Y (2017) Identity-aware convolutional neural network for facial expression recognition. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, 2017, pp 558–565. https://doi.org/10.1109/FG.2017.140
Shan K, Guo J, You W, Lu D, Bie R (2017) Automatic facial expression recognition based on a deep convolutional-neural-network structure. In: 2017 IEEE 15th international conference on software engineering research, management and applications (SERA). IEEE, 2017, pp 123–128. https://doi.org/10.1109/SERA.2017.7965717
Georgescu M-I, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836. https://doi.org/10.1109/ACCESS.2019.2917266
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141. https://doi.org/10.1109/CVPR.2018.00745
Gao Z, Xie J, Wang Q, Li P (2019) Global second-order pooling convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3024–3033. https://doi.org/10.1109/CVPR.2019.00314
Yang Z, Zhu L, Wu Y, Yang Y (2020) Gated channel transformation for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11794–11803. https://doi.org/10.1109/CVPR42600.2020.01181
Nie X, Ding H, Qi M, Wang Y, Wong EK (2021) Urca-gan: Upsample residual channel-wise attention generative adversarial network for image-to-image translation. Neurocomputing 443:75–84. https://doi.org/10.1016/j.neucom.2021.02.054
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11531–11539. https://doi.org/10.1109/CVPR42600.2020.01155
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659–5667. https://doi.org/10.1109/CVPR.2017.667
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), September 2018. https://doi.org/10.1007/978-3-030-01234-2_1
Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7151–7160. https://doi.org/10.1109/CVPR.2018.00747
Lee H, Kim H-E, Nam H (2019) Srm: a style-based recalibration module for convolutional neural networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1854–1862. https://doi.org/10.1109/ICCV.2019.00194
Diba A, Fayyaz M, Sharma V, Arzani MM, Yousefzadeh R, Gall J, Van Gool L (2018) Spatio-temporal channel correlation networks for action classification. In: Proceedings of the European conference on computer vision (ECCV), pp 284–299. https://doi.org/10.1007/978-3-030-01225-0_18
Pecoraro R, Basile V, Bono V (2022) Local multi-head channel self-attention for facial expression recognition. Information 13(9):419. https://doi.org/10.3390/info13090419
Qin Z, Zhang P, Wu F, Li X (2021) Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 783–792. https://doi.org/10.1109/ICCV48922.2021.00082
Liu K, Zhang M, Pan Z (2016) Facial expression recognition with cnn ensemble. In: 2016 international conference on cyberworlds (CW). IEEE, 2016, pp 163–166. https://doi.org/10.1109/CW.2016.34
Giannopoulos P, Perikos I, Hatzilygeroudis I (2018) Deep learning approaches for facial emotion recognition: a case study on fer-2013. In: Advances in hybridization of intelligent methods. Springer, pp 1–16. https://doi.org/10.1007/978-3-319-66790-4_1
Fard AP, Mahoor MH (2022) Ad-corre: adaptive correlation-based loss for facial expression recognition in the wild. IEEE Access 10:26756–26768. https://doi.org/10.1109/ACCESS.2022.3156598
Khanzada A, Bai C, Celepcikay FT (2020) Facial expression recognition with deep learning. Preprint at arXiv:2004.11823. https://doi.org/10.48550/arXiv.2004.11823
Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In: European conference on information retrieval. Springer, pp 345–359. https://doi.org/10.1007/978-3-540-31865-1_25
Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process 5(2):1. https://doi.org/10.5121/ijdkp.2015.5201