Deeper cascaded peak-piloted network for weak expression recognition
Tóm tắt
Facial expression recognition is in general a challenging problem, especially in the presence of weak expression. Most recently, deep neural networks have been emerging as a powerful tool for expression recognition. However, due to the lack of training samples, existing deep network-based methods cannot fully capture the critical and subtle details of weak expression, resulting in unsatisfactory results. In this paper, we propose Deeper Cascaded Peak-piloted Network (DCPN) for weak expression recognition. The technique of DCPN has three main aspects: (1) Peak-piloted feature transformation, which utilizes the peak expression (easy samples) to supervise the non-peak expression (hard samples) of the same type and subject; (2) the back-propagation algorithm is specially designed such that the intermediate-layer feature maps of non-peak expression are close to those of the corresponding peak expression; and (3) an novel integration training method, cascaded fine-tune, is proposed to prevent the network from overfitting. Experimental results on two popular facial expression databases, CK
$$+$$
and Oulu-CASIA, show the superiority of the proposed DCPN over state-of-the-art methods.
Tài liệu tham khảo
Agarwal, S., Santra, B., Mukherjee, D.P.: Anubhav : recognizing emotions through facial expression. Vis. Comput. 1–15 (2016)
Bargal, S.A., Barsoum, E., Ferrer, C.C., Zhang, C.: Emotion recognition in the wild from videos using images. In: ACM International Conference on Multimodal Interaction, pp. 433–436 (2016)
Bartlett, M.S., Littlewort, G., Frank, M., Lainscsek, C., Fasel, I., Movellan, J.: Recognizing facial expression: Machine learning and application to spontaneous behavior. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005. vol. 2, pp. 568–573 (2005)
Chi, J., Tu, C., Zhang, C.: Dynamic 3d facial expression modeling using Laplacian smooth and multi-scale mesh matching. Vis. Comput. 30(6–8), 649–659 (2014)
Chopra, S., Hadsell, R., Lecun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005. vol. 1, pp. 539–546 (2005)
Danelakis, A., Theoharis, T., Pratikakis, I.: A spatio-temporal wavelet-based descriptor for dynamic 3d facial expression retrieval and recognition. Vis. Comput. 32(6–8), 1–11 (2016)
Dhall, A., Goecke, R., Joshi, J., Hoey, J., Gedeon, T.: Emotiw 2016: video and group-level emotion recognition challenges. In: ACM International Conference on Multimodal Interaction, pp. 427–432 (2016)
Fan, Y., Lu, X., Li, D., Liu, Y.: Video-based emotion recognition using cnn-rnn and c3d hybrid networks. In: ACM International Conference on Multimodal Interaction, pp. 445–450 (2016)
Guo, Y., Zhao, G., Pietikainen, M.: Dynamic Facial Expression Recognition Using Longitudinal Facial Expression Atlases. Springer, Berlin (2012)
Han, S., Meng, Z., KHAN, A.S., Tong, Y.: Incremental boosting convolutional neural network for facial action unit recognition. Adv. Neural Inf. Process. Syst. 29, 109–117 (2016)
He, J., Hu, J.F., Lu, X., Zheng, W.S.: Multi-task mid-level feature learning for micro-expression recognition. Pattern Recognit. 66, 44–52 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hung, A.P., Wu, T., Hunter, P., Mithraratne, K.: A framework for generating anatomically detailed subject-specific human facial models for biomechanical simulations. Vis. Comput. 31(5), 527–539 (2015)
Jaiswal, S., Valstar, M.: Deep learning the dynamic appearance and shape of facial action units. In: Winter Applications in Computer Vision, pp. 1–8 (2016)
Jung, H., Lee, S., Yim, J., Park, S.: Joint fine-tuning in deep neural networks for facial expression recognition. In: IEEE International Conference on Computer Vision, pp. 2983–2991 (2015)
Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: British Machine Vision Conference 2008, Leeds, September (2008)
Li, X., Mori, G., Zhang, H.: Expression-invariant face recognition with expression classification. In: The Canadian Conference on Computer and Robot Vision, p. 77 (2006)
Liu, M., Li, S., Shan, S., Wang, R., Chen, X.: Deeply Learning Deformable Facial Action Parts Model for Dynamic Expression Analysis. Springer International Publishing, Berlin (2014)
Liu, M., Shan, S., Wang, R., Chen, X.: Learning expression lets on spatio-temporal manifold for dynamic facial expression recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1749–1756 (2014)
Liu, P., Han, S., Meng, Z., Tong, Y.: Facial expression recognition via a boosted deep belief network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1805–1812 (2014)
Liu, Y.J., Zhang, J.K., Yan, W.J., Wang, S.J., Zhao, G., Fu, X.: A main directional mean optical flow feature for spontaneous micro-expression recognition. IEEE Trans. Affect. Comput. 7(4), 1–1 (2016)
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J.: The extended Cohn–Kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: Computer Vision and Pattern Recognition Workshops, pp. 94–101 (2010)
Metaxas, D.N., Huang, J., Liu, B., Yang, P., Liu, Q., Zhong, L.: Learning active facial patches for expression analysis. In: Computer Vision and Pattern Recognition, pp. 2562–2569 (2012)
Shan, C., Gong, S., Mcowan, P.W.: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Workshop Track International Conference on Learning Representations, pp. 1–12 (2016)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–13 (2014)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Taini, M., Zhao, G., Li, S.Z., Pietikainen, M.: Facial expression recognition from near-infrared video sequences. In: International Conference on Pattern Recognition, pp. 1–4 (2011)
Valstar, M.F., Almaev, T., Girard, J.M., Mckeown, G.: Fera 2015 second facial expression recognition and analysis challenge. In: IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, pp. 1–8 (2015)
Yao, A., Cai, D., Hu, P., Wang, S., Sha, L., Chen, Y.: Holonet: towards robust emotion recognition in the wild. In: The ACM International Conference, pp. 472–478 (2016)
Yu, Z., Zhang, C.: Image based static facial expression recognition with multiple deep network learning. In: ACM on International Conference on Multimodal Interaction, pp. 435–442 (2015)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multi-task cascaded convolutional networks. IEEE J. Solid State Circuits 23(99), 1161–1173 (2016)
Zhang, Z., Luo, P., Chen, C.L., Tang, X.: Facial landmark detection by deep multi-task learning. In: European Conference on Computer Vision, pp. 94–108 (2014)
Zhao, R., Gan, Q., Wang, S., Ji, Q.: Facial expression intensity estimation using ordinal information. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3466–3474 (2016)
Zhao, X., Liang, X., Liu, L., Li, T., Han, Y., Vasconcelos, N., Yan, S.: Peak-piloted deep network for facial expression recognition. In: European Conference on Computer Vision, pp. 425–442 (2016)