Real-time head pose estimation using multi-task deep neural network

Robotics and Autonomous Systems - Tập 103 - Trang 1-12 - 2018
Byungtae Ahn1, Dong-Geol Choi1, Jaesik Park1, In So Kweon1
1Robotics and Computer Vision Lab, KAIST, Daejeon, Republic of Korea

Tài liệu tham khảo

Hug, 2004, Estimating face pose by facial asymmetry and geometry, 651 Cootes, 1995, Active shape models-their training and application, Comput. Vis. Image Underst., 61, 38, 10.1006/cviu.1995.1004 Cootes, 2001, Active appearance models, IEEE Trans. Pattern Anal. Mach. Intell., 23, 681, 10.1109/34.927467 Martins, 2008, Accurate single view model-based head pose estimation, 1 Xiong, 2013, Supervised descent method and its applications to face alignment, 532 F.D. la Torre, W.S. Chu, X. Xiong, F. Vicente, X. Ding, J. Cohn, Intraface, in: IEEE International Conference on Automatic Face and Gesture Recognition, FG, Vol. 1, 2015, pp. 1–8. Dopfer, 2014, 3d active appearance model alignment using intensity and range data, Robot. Auton. Syst., 62, 168, 10.1016/j.robot.2013.11.002 Tawari, 2014, Continuous head movement estimator for driver assistance: issues, algorithms, and on-road evaluations, IEEE Trans. Intell. Transp. Syst., 15, 818, 10.1109/TITS.2014.2300870 Narayanan, 2014, Yaw estimation using cylindrical and ellipsoidal face models, IEEE Trans. Intell. Transp. Syst., 15, 2308, 10.1109/TITS.2014.2313371 Narayanan, 2016, Estimation of driver head yaw angle using a generic geometric model, IEEE Trans. Intell. Transp. Syst., 17, 3446, 10.1109/TITS.2016.2551298 Zhu, 2012, Face detection, pose estimation and landmark localization in the wild, 2879 Vicente, 2015, Driver gaze tracking and eyes off the road detection system, IEEE Trans. Intell. Transp. Syst., 16, 2014, 10.1109/TITS.2015.2396031 Balasubramanian, 2007, Biased manifold embedding: a framework for person-independent head pose estimation Foytik, 2013, A two-layer framework for piecewise linear manifold-based head pose estimation, Int. J. Comput. Vis., 101, 270, 10.1007/s11263-012-0567-y Grujić, 2008, 3D facial pose estimation by image retrieval Gourier, 2004, Estimating face orientation from robust detection of salient facial features Huang, 2010, Head pose estimation based on random forests for multiclass classification, 934 BenAbdelkader, 2010, Robust head pose estimation using supervised manifold learning, 518 Ji, 2011, Robust head pose estimation via convex regularized sparse regression, 3617 Breitenstein, 2008, Real-time face pose estimation from single range images Fanelli, 2011, Real time head pose estimation from consumer depth cameras, 101 Fanelli, 2013, Random forests for real time 3D face analysis, Int. J. Comput. Vis., 101, 437, 10.1007/s11263-012-0549-0 Lecun, 1989, Backpropagation applied to handwritten zip code recognition, Neural Comput., 1, 541, 10.1162/neco.1989.1.4.541 Krizhevsky, 2012, Imagenet classification with deep convolutional neural networks, Adv. Neural Inf. Process. Syst. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2015. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 770–778. Sun, 2013, Deep convolutional network cascade for facial point detection, 3476 H. Li, Z. Lin, X. Shen, J. Brandt, G. Hua, A convolutional neural network cascade for face detection, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2015, pp. 5325–5334. Zhang, 2016, Learning deep representation for face alignment with auxiliary attributes, IEEE Trans. Pattern Anal. Mach. Intell., 38, 918, 10.1109/TPAMI.2015.2469286 B. Ahn, J. Park, I.S. Kweon, Real-time head orientation from a monocular camera using deep neural network, in: Asian Conference on Computer Vision, ACCV, 2014, pp. 82–96. Caruana, 1997, Multitask learning, Mach. Learn., 41, 10.1023/A:1007379606734 A. Vezhnevets, J.M. Buhmann, Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2010, pp. 3249–3256. Romera-paredes, 2012, Exploiting unrelated tasks in multi-task learning, Adv. Neural Inf. Process. Syst. M. Lapin, B. Schiele, M. Hein, Scalable multitask representation learning for scene classification, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2014, pp. 1434–1441. Toshev, 2014, Deeppose: human pose estimation via deep neural networks Li, 2015, Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network, Int. J. Comput. Vis., 113, 19, 10.1007/s11263-014-0767-8 H. Jung, S. Lee, J. Yim, S. Park, J. Kim, Joint fine-tuning in deep neural networks for facial expression recognition, in: IEEE International Conference on Computer Vision Workshops, ICCVW, 2015, pp. 2983–2991. Nair, 2010, Rectified linear units improve restricted boltzmann machines, 807 R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2014, pp. 580–587. M. Koestinger, P. Wohlhart, P.M. Roth, H. Bischof, Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization, in: IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011. T. Weise, S. Bouaziz, H. Li, M. Pauly, Realtime performance-based facial animation, 30 (4) (2011). Huang, 2007 Belhumeur, 2013, Localizing parts of faces using a consensus of exemplars, IEEE Trans. Pattern Anal. Mach. Intell., 35, 2930, 10.1109/TPAMI.2013.23 V. Le, J. Brandt, Z. Lin, L. Bourdev, T.S. Huang, Interactive facial feature localization, in: European Conference on Computer Vision, ECCV, 2012, pp. 679–692. X.P. Burgos-Artizzu, P. Perona, P. Dollár, Robust face landmark estimation under occlusion, in: IEEE International Conference on Computer Vision Workshops, ICCVW, 2013, pp. 1513–1520. P. Lucey, J.F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, I. Matthews, The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression, in: IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW, 2010, pp. 94–101. Viola, 2001, Robust real-time object detection, Int. J. Comput. Vis. J. Paone, D. Bolme, R. Ferrell, D. Aykac, T. Karnowski, Baseline face detection, head pose estimation, and coarse direction detection for facial data in the SHRP2 naturalistic driving study, in: IEEE Intelligent Vehicles Symposium (IV), 2015, pp. 174–179. S. Yang, P. Luo, C.C. Loy, X. Tang, Wider face: A face detection benchmark, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016. Z. Liu, P. Luo, X. Wang, X. Tang, Deep learning face attributes in the wild, in: Proceedings of International Conference on Computer Vision, ICCV, 2015. V. Drouard, S. Ba, G. Evangelidis, A. Deleforge, R. Horaud, Head pose estimation via probabilistic high-dimensional regression, in: IEEE International Conference on Image Processing, ICIP, 2015, pp. 4624–4628. Nuevo, 2010, RSMAT: Robust simultaneous modeling and tracking, Pattern Recognit. Lett., 31, 2455, 10.1016/j.patrec.2010.07.016