A cloud-based face video retrieval system with deep learning

Springer Science and Business Media LLC - Tập 76 - Trang 8473-8493 - 2020
Feng-Cheng Lin1, Huu-Huy Ngo1, Chyi-Ren Dow1
1Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan

Tóm tắt

Face video retrieval is an attractive research topic in computer vision. However, it remains challenges to overcome because of the significant variation in pose changes, illumination conditions, occlusions, and facial expressions. In video content analysis, face recognition has been playing a vital role. Besides, deep neural networks are being actively studied, and deep learning models have been widely used for object detection, especially for face recognition. Therefore, this study proposes a cloud-based face video retrieval system with deep learning. First, a dataset is collected and pre-processed. To produce a useful dataset for the CNN models, blurry images are removed, and face alignment is implemented on the remaining images. Then the final dataset is constructed and used to pre-train the CNN models (VGGFace, ArcFace, and FaceNet) for face recognition. We compare the results of these three models and choose the most efficient one to develop the system. To implement a query, users can type in the name of a person. If the system detects a new person, it performs enrolling that person. Finally, the result is a list of images and time associated with those images. In addition, a system prototype is implemented to verify the feasibility of the proposed system. Experimental results demonstrate that this system outperforms in terms of recognition accuracy and computational time.

Tài liệu tham khảo

Caltech faces. http://www.vision.caltech.edu/html-files/archive.html. Accessed 15 Jul 2019 Cheron G, Laptev I, Schmid C (2015) P-CNN: pose-based CNN features for action recognition. In: The IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, pp 3218–3226 Deng J, Guo J, Xue N, Zafeiriou S (2019) ArcFace: additive angular margin loss for deep face recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp 4690–4699 Ding C, Tao D (2018) Trunk-branch ensemble convolutional neural networks for video-based face recognition. IEEE Trans Pattern Anal Mach Intell 40(4):1002–1014 Dong Z, Jia S, Wu T, Pei M (2016) Face video retrieval via deep learning of binary hash representations. In: the 30th AAAI Conference on Artificial Intelligence, Phoenix, Arizona, USA, pp 3471–3477 Dow CR, Ngo HH, Lee LH, Lai PY, Wang KC, Bui VT (2019) A crosswalk pedestrian recognition system by using deep learning and Zebra-crossing recognition techniques. Softw Pract Exp. https://doi.org/10.1002/spe.2742 Extended yale face database B. http://vision.ucsd.edu/content/extended-yale-face-database-b-b. Accessed 15 Jul 2019 Face alignment using MTCNN. https://github.com/davidsandberg/facenet/tree/master/src/align. Accessed 25 Feb 2019 FaceNet. https://github.com/davidsandberg/facenet/. Accessed 25 Feb 2019 Facial images database. https://cswww.essex.ac.uk/mv/allfaces/index.html. Accessed 15 Jul 2019 Gupta V, Mallick S (2019) Face recognition: an introduction for beginners. https://www.learnopencv.com/face-recognition-an-introduction-for-beginners/?ck_subscriber_id=272178015 Hassner T, Masi I, Kim J, Choi J, Harel S, Natarajan P, Medioni G (2016) Pooling faces: template based face recognition with pooled face images. In: The IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, NV, USA, pp 127–135 Herrmann C, Willersinn D, Beyerer J (2016) Low-resolution convolutional neural networks for video face recognition. In: The 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Colorado Springs, CO, USA, pp 221–227 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp 770–778 Hossain MS, Muhammad G (2015) Cloud-assisted speech and face recognition framework for health monitoring. Mob Netw Appl 20(3):391–399 Huang L, Zhou J (2017) DiFace: a face-based video retrieval system with distributed computing. Am J Syst Softw 5(1):9–14 Huang Z, Wang R, Shan S, Gool LV, Chen X (2018) Cross Euclidean-to-Riemannian metric learning with application to face recognition from video. IEEE Trans Pattern Anal Mach Intell 40(12):2827–2840 Insight face tensorflow. https://github.com/luckycallor/InsightFace-tensorflow. Accessed 20 Mar 2019 Jing C, Dong Z, Pei M, Jia Y (2017) Fusing appearance features and correlation features for face video retrieval. In: 18th Pacific-Rim Conference on Multimedia, Harbin, China, pp 150–160 Li C, Wei W, Li J, Song W (2017) A cloud-based monitoring system via face recognition using Gabor and CS-LBP features. J Supercomput 73(4):1532–1546 Li Y, Wang R, Huang Z, Shan S, Chen X (2015) Face video retrieval with image query via hashing across Euclidean space and Riemannian manifold. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp 4758–4767 MIT-CBCL face recognition database. http://cbcl.mit.edu/software-datasets/heisele/facerecognition-database.html. Accessed 15 Jul 2019 Park DS (2018) Future computing with IoT and cloud computing. J Supercomput 74(12):6401–6407 Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. In: The British Machine Vision Conference (BMVC), Swansea, United Kingdom, pp 1–12 Pawle AA, Pawar VP (2013) Face recognition system (FRS) on cloud computing for user authentication. Int J Soft Comput Eng 3(4):189–192 Pech-Pacheco JL, Cristobal G, Chamorro-Martinez J, Fernandez-Valdivia J (2000) Diatom autofocusing in brightfield microscopy: a comparative study. In: Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, Barcelona, Spain, pp 314–317 Pertuz S, Puig D, Garcia MA (2013) Analysis of focus measure operators for shape-from-focus. Pattern Recognit 46(5):1415–1432 Qiao S, Wang R, Shan S, Chen X (2019) Deep heterogeneous hashing for face video retrieval. IEEE Trans Image Process 29:1299–1312 Qiao S, Wang R, Shan S, Chen X (2016) Deep video code for efficient face video retrieval. In: The 13th Asian Conference on Computer Vision, Taipei, Taiwan, pp 296–312 Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: a unified embedding for face recognition and clustering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp 815–823 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556 Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: The 31st AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, pp 4278–4284 Trigueros DS, Meng L, Hartnett M (2018) Face recognition: from traditional to deep learning methods. arXiv:181100116 pp 1–13 YouTube faces DB. http://www.cs.tau.ac.il/~wolf/ytfaces/index.html#download. Accessed 15 Jul 2019