Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo
Nhận diện khuôn mặt giữa ảnh và tranh biếm họa dựa trên học đa nhiệm động
Tóm tắt
Nhận diện khuôn mặt từ các hình ảnh trực quan thực tế (ví dụ: ảnh chụp) đã được nghiên cứu sâu và đạt được những tiến bộ đáng kể trong thập kỷ vừa qua. Tuy nhiên, nhận diện khuôn mặt giữa hình ảnh trực quan thực tế/ảnh chụp và tranh biếm họa vẫn là một vấn đề đầy thách thức. Khác với ảnh chụp, các phong cách nghệ thuật khác nhau của tranh biếm họa tạo ra những biến dạng phi cứng nhắc cực kỳ mạnh mẽ. Khoảng cách biểu diễn lớn giữa các phương thức khác nhau của ảnh chụp và tranh biếm họa là một thách thức lớn đối với nhận diện khuôn mặt từ ảnh chụp và tranh biếm họa. Trong bài báo này, chúng tôi đề xuất thực hiện nhận diện khuôn mặt giữa ảnh chụp và tranh biếm họa theo phương pháp học đa nhiệm, có thể học các đặc trưng của các phương thức khác nhau với các nhiệm vụ khác nhau. Thay vì thiết lập trọng số nhiệm vụ một cách thủ công như trong học đa nhiệm thông thường, công trình này đề xuất một mô-đun học trọng số động có thể tự động tạo ra/học trọng số nhiệm vụ theo tầm quan trọng của các nhiệm vụ trong quá trình đào tạo. Các trọng số nhiệm vụ đã học cho phép mạng tập trung vào việc đào tạo những nhiệm vụ khó thay vì bị kẹt trong việc quá đào tạo các nhiệm vụ dễ. Kết quả thực nghiệm cho thấy hiệu quả của việc học đa nhiệm động được đề xuất cho nhận diện khuôn mặt giữa ảnh chụp và tranh biếm họa. Hiệu suất trên các tập dữ liệu CaVI và WebCaricature cho thấy sự vượt trội so với các phương pháp hiện đại. Mã thực hiện được cung cấp tại đây. (https://github.com/hengxyz/cari-visual-recognition-via-multitask-learning.git)
Từ khóa
Tài liệu tham khảo
Taigman, Yaniv, Yang, Ming, et al.: Deepface: Closing the gap to human-level performance in face verification. In: CVPR, pp. 1701–1708, (2014)
Parkhi, Omkar M., Vedaldi, Andrea, Zisserman, Andrew, et al.: Deep face recognition. In: BMVC, p. 6, (2015)
Schroff, Florian, Kalenichenko, Dmitry, Philbin, James: Facenet: A unified embedding for face recognition and clustering. In: CVPR, pp. 815–823, (2015)
Liu, Weiyang, Wen, Yandong, Yu, Zhiding, Li, Ming, Raj, Bhiksha, Song, Le.: Sphereface: Deep hypersphere embedding for face recognition. In: The CVPR, vol. 1, p. 1 (2017)
Huang, Gary B., Ramesh, Manu, Berg, Tamara, Learned-Miller, Erik: Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report, Technical Report 07-49, University of Massachusetts, Amherst (2007)
Wolf, Lior, Hassner, Tal, Maoz, Itay: Face recognition in unconstrained videos with matched background similarity. In: CVPR, 2011 IEEE Conference on, pp. 529–534. IEEE (2011)
Ahonen, Timo: Hadid, Abdenour, Pietikainen, Matti: face description with local binary patterns: application to face recognition. IEEE Transact. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006)
Tan, Xiaoyang, Triggs, Bill: Fusing gabor and lbp feature sets for kernel-based face recognition. In: International workshop on analysis and modeling of faces and gestures, pp. 235–249. Springer (2007)
Déniz, Oscar: Bueno, Gloria, Salido, Jesús, De la Torre, Fernando: Face recognition using histograms of oriented gradients. Pattern Recognit. Lett. 32(12), 1598–1603 (2011)
Bicego, Manuele, Lagorio, Andrea, Grosso, Enrico, Tistarelli, Massimo: On the use of sift features for face authentication. In: Computer Vision and Pattern Recognition Workshop, 2006. CVPRW’06. Conference on, pp. 35–35. IEEE (2006)
Huo, Jing, Li, Wenbin, Shi, Yinghuan, Gao, Yang, Yin, Hujun: Webcaricature: a benchmark for caricature recognition. In: British Machine Vision Conference (2018)
Mittal, Paritosh, Vatsa, Mayank, Singh, Richa: Composite sketch recognition via deep network-a transfer learning approach. In: 2015 International Conference on Biometrics (ICB), pp. 251–256. IEEE (2015)
Galea, Christian, Farrugia, Reuben A.: Forensic face photo-sketch recognition using a deep learning-based architecture. IEEE Signal Process. Lett. 24(11), 1586–1590 (2017)
Li, Shan, Deng, Weihong: Deep facial expression recognition: A survey. IEEE Transactions on Affective Computing (2020)
He, Ran, Wu, Xiang, Sun, Zhenan, Tan, Tieniu: Learning invariant deep representation for nir-vis face recognition. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
He, Ran, Xiang, Wu, Sun, Zhenan, Tan, Tieniu: Wasserstein cnn: learning invariant features for nir-vis face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1761–1773 (2018)
Kim, Donghyun, Hernandez, Matthias, Choi, Jongmoo, Medioni, Gérard: Deep 3d face identification. In: 2017 IEEE international joint conference on biometrics (IJCB), pp. 133–142. IEEE (2017)
Zulqarnain Gilani, Syed, Mian, Ajmal: Learning from millions of 3d scans for large-scale 3d face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1896–1905 (2018)
Garg, Jatin, Peri, Skand Vishwanath, Tolani, Himanshu, Krishnan, Narayanan C.: Deep cross modal learning for caricature verification and identification (cavinet). arXiv preprint arXiv:1807.11688, (2018)
Cai, Deng, He, Xiaofei, Han, Jiawei: Speed up kernel discriminant analysis. VLDB J. 20(1), 21–33 (2011)
van der Maaten, Laurens, Hinton, Geoffrey: Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Ruder, Sebastian: An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, (2017)
Girshick, Ross: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2015)
Ranjan, Rajeev, Patel, Vishal M., Chellappa, Rama: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41, 121 (2017)
Tian, Yonglong, Luo, Ping, Wang, Xiaogang, Tang, Xiaoou: Pedestrian detection aided by deep learning semantic tasks. In: Proceedings of the CVPR, pp. 5079–5087 (2015)
Chen, Zhao, Badrinarayanan, Vijay, Lee, Chen-Yu, Rabinovich, Andrew: Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. arXiv preprint arXiv:1711.02257, (2017)
Kendall, Alex, Gal, Yarin, Cipolla, Roberto: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7482–7491 (2018)
Yin, Xi, Liu, Xiaoming: Multi-task convolutional neural network for pose-invariant face recognition. IEEE Trans. Image Proces. 27(2), 964–975 (2008)
Duong, Long, Cohn, Trevor, Bird, Steven, Cook, Paul: Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 845–850 (2015)
Misra, Ishan, Shrivastava, Abhinav, Gupta, Abhinav, Hebert, Martial: Cross-stitch networks for multi-task learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3994–4003 (2016)
Bragman, Felix J.S., Tanno, Ryutaro, Ourselin, Sebastien, Alexander, Daniel C., Cardoso, Jorge: Stochastic filter groups for multi-task cnns: Learning specialist and generalist convolution kernels. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1385–1394 (2019)
Chen, Weihua, Chen, Xiaotang, Zhang, Jianguo, Huang, Kaiqi: (2017) A multi-task deep network for person re-identification. In: AAAI, pp. 3988–3994
Zhang, Zhanpeng: Luo, Ping, Loy, Chen Change, Tang, Xiaoou, : Learning deep representation for face alignment with auxiliary attributes. IEEE Trans. Pattern Anal. Mach. Intell. 38(5), 918–930 (2016)
Tran, Anh T., Nguyen, Cuong V., Hassner, Tal: Transferability and hardness of supervised classification tasks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1395–1405 (2019)
Sun, Yi, Wang, Xiaogang, Tang, Xiaoou: Deeply learned face representations are sparse, selective, and robust. In: CVPR, pp. 2892–2900 (2015)
Simonyan, Karen, Zisserman, Andrew: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Wen, Yandong, Zhang, Kaipeng, Li, Zhifeng, Qiao, Yu: A discriminative feature learning approach for deep face recognition. In: European Conference on Computer Vision, pp. 499–515. Springer (2016)
Kemelmacher-Shlizerman, Ira, Seitz, Steven M., Miller, Daniel, Brossard, Evan: The megaface benchmark: 1 million faces for recognition at scale. In: Proceedings of the CVPR, pp. 4873–4882 (2016)
Zhang, Liliang, Lin, Liang, Wu, Xian, Ding, Shengyong, Zhang, Lei: End-to-end photo-sketch generation via fully convolutional representation learning. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pp. 627–634 (2015)
Zhu, Jun-Yan, Park, Taesung, Isola, Phillip, Efros, Alexei A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp. 2223–2232 (2017)
Wang, Lidan, Sindagi, Vishwanath, Patel, Vishal: High-quality facial photo-sketch synthesis using multi-adversarial networks. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pp. 83–90. IEEE (2018)
Saxena, Shreyas, Verbeek, Jakob: Heterogeneous face recognition with cnns. In: European conference on computer vision, pp. 483–491. Springer (2016)
Liu, Xiaoxiang, Song, Lingxiao, Wu, Xiang, Tan, Tieniu: Transferring deep representation for nir-vis heterogeneous face recognition. In: 2016 International Conference on Biometrics (ICB), pp. 1–8. IEEE (2016)
Lezama, José, Qiu, Qiang, Sapiro, Guillermo: Not afraid of the dark: Nir-vis face recognition via cross-spectral hallucination and low-rank embedding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6628–6637 (2017)
Collobert, Ronan, Weston, Jason: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning, pp. 160–167. ACM, (2008)
Deng, Li, Hinton, Geoffrey, Kingsbury, Brian: New types of deep neural network learning for speech recognition and related applications: An overview. In: Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp. 8599–8603. IEEE (2013)
Szegedy, Christian, Ioffe, Sergey, Vanhoucke, Vincent, Alemi, Alexander A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Simonyan, Karen, Omkar, M., et al. Parkhi. Fisher vector faces in the wild. In: BMVC, p. 4 (2013)
Amos, Brandon, Ludwiczuk, Bartosz, Satyanarayanan, Mahadev, et al. Openface: A general-purpose face recognition library with mobile applications. CMU School of Computer Science, 6, (2016)
MegviiInc. Face++ research toolkit. www.faceplusplus.com,. (December 2013)
Guo, Yandong, Zhang, Lei, Hu, Yuxiao, He, Xiaodong, Gao, Jianfeng: Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In: European Conference on Computer Vision, pp. 87–102. Springer (2016)
Zhang, Kaipeng, Zhang, Zhanpeng, Li, Zhifeng, Qiao, Yu.: Joint face detection and alignment using multitask cascaded convolutional networks. Signal Proces. Lett. 23(10), 1499–1503 (2016)
Glorot, Xavier, Bengio, Yoshua: Understanding the difficulty of training deep feedforward neural networks. In: 13th International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
