Tổng hợp biểu cảm khuôn mặt dựa trên các khuôn mặt tương tự

Multimedia Tools and Applications - Tập 80 - Trang 36465-36489 - 2021
Rafael Luiz Testa1, Ariane Machado-Lima1, Fátima L. S. Nunes1
1School of Arts, Sciences and Humanities of the University of São Paulo, São Paulo, Brazil

Tóm tắt

Tổng hợp biểu cảm khuôn mặt có nhiều ứng dụng liên quan đến hoạt hình, tương tác giữa con người và máy tính, giải trí và đào tạo cho những người mắc rối loạn tâm lý. Mục tiêu của tổng hợp biểu cảm khuôn mặt là thay đổi biểu cảm khuôn mặt của một hình ảnh, thường thông qua việc tái hiện các chuyển động khuôn mặt từ một hình ảnh mẫu sang hình ảnh mục tiêu. Các phương pháp dựa trên biến dạng thường chọn hình ảnh mẫu một cách thủ công, dẫn đến các kết quả khác nhau tùy thuộc vào sự lựa chọn này. Nghiên cứu này khác với tài liệu hiện có bằng cách đề xuất và đánh giá các kỹ thuật xem xét sự tương đồng giữa các hình ảnh khuôn mặt để chọn hình ảnh nguồn. Mục tiêu chính là điều tra ảnh hưởng của việc chọn hình ảnh nguồn đến các biểu cảm khuôn mặt của cảm xúc được tạo ra. Chúng tôi đề xuất ba kỹ thuật để chọn các khuôn mặt tương tự trong quy trình tổng hợp biểu cảm khuôn mặt và so sánh chúng với những phương pháp khác. Chúng tôi cũng so sánh các cảm xúc tổng hợp được tạo ra với kết quả của các phương pháp gần đây từ tài liệu bằng cách sử dụng các chỉ số khách quan. Các phát hiện của chúng tôi cho thấy một trong các kỹ thuật được đề xuất có kết quả tốt hơn trong việc tìm kiếm các khuôn mặt tương tự và có kết quả tương tự hoặc tốt hơn cho việc tổng hợp khi so với tài liệu. Thêm vào đó, một phân tích hình ảnh cho thấy rằng các khuôn mặt tương tự có thể cải thiện tính hiện thực của các hình ảnh tổng hợp, đặc biệt khi so sánh với các hình ảnh khuôn mặt được chọn ngẫu nhiên.

Từ khóa

#Tổng hợp biểu cảm khuôn mặt #tương tác giữa con người và máy tính #hình ảnh khuôn mặt #cảm xúc #kỹ thuật tổng hợp.

Tài liệu tham khảo

Abboud B, Davoine F, Dang M (2004) Facial expression recognition and synthesis based on an appearance model. Signal Process Image Commun 19(8):723–740. https://doi.org/10.1016/j.image.2004.05.009

Agarwal S, Chatterjee M, mukherjee DP (2012) Synthesis of emotional expressions specific to facial structure. In: Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP 12(28):1–28:8. ACM, New York, NY, USA. https://doi.org/10.1145/2425333.2425361

Aifanti N, Papachristou C, Delopoulos A (2010) The mug facial expression database. In: 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10:1–4

Averbuch-Elor H, Cohen-Or D, Kopf J, Cohen MF (2017) Bringing portraits to life. ACM Trans Graph 36(6):196:1–196:13. https://doi.org/10.1145/3130800.3130818

Bailey DG (2011) Design for embedded image processing on FPGAs. John Wiley & Sons

Bradski G (2000) The opencv library. Dr. Dobb’s J Softw Tools

Cheng Y, Ling S (2008) 3d animated facial expression and autism in Taiwan. In: Advanced Learning Technologies, 2008. ICALT’08. Eighth IEEE International Conference on, pp. 17–19. IEEE

Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J (2018) Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8789–8797. https://doi.org/10.1109/CVPR.2018.00916

Deb D, Zhang J, Jain AK (2020) Advfaces: Adversarial face synthesis. In: 2020 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–10. https://doi.org/10.1109/IJCB48548.2020.9304898

Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255

Ding H, Sricharan K, Chellappa R (2018) Exprgan: Facial expression editing with controllable expression intensity. In: Thirty-Second AAAI Conference on Artificial Intelligence

Ekman P, Friesen WV, Ellsworth P (1972) Emotion in the human face: Guidelines for research and an integration of findings. Pergamon Press, Oxford, England

Ekman P, Friesen WV, Hager JC (2002) Facs investigator’s guide. A Hum Face

Everingham M, Eslami SM, Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: A retrospective. Int J Comput Vis 111(1):98–136. https://doi.org/10.1007/s11263-014-0733-5

Fujishiro H, Suzuki T, Nakano S, Mejima A, Morishima S (2009) A natural smile synthesis from an artificial smile. In: SIGGRAPH ’09: Posters, SIGGRAPH ’09, pp. 59:1–59:1. ACM, New York, NY, USA. https://doi.org/10.1145/1599301.1599360

Geng J, Shao T, Zheng Y, Weng Y, Zhou K (2018) Warp-guided gans for single-photo facial animation. ACM Trans Graph 37(6). https://doi.org/10.1145/3272127.3275043

Ghent J, McDonald J (2005) Photo-realistic facial expression synthesis. Image Vis Comput 23(12), 1041–1050. https://doi.org/10.1016/j.imavis.2005.06.011

Golan O, Baron-Cohen S (2006) Systemizing empathy: Teaching adults with asperger syndrome or high-functioning autism to recognize complex emotions using interactive multimedia. Dev Psychopathol 591–617. https://doi.org/10.1017/S0954579406060305

Grynszpan O, Martin JC, Nadel J (2008) Multimedia interfaces for users with high functioning autism: An empirical investigation. International J Hum Comput Stud 66(8):628–639

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. https://doi.org/10.1109/CVPR.2016.90

Horn RA, Johnson CR (2012) Matrix Analysis, 2nd edn. Cambridge University Press, USA

Izard CE (1971) The face of emotion. Appleton-Century-Crofts, East Norwalk, CT, US

Jian M, Cui C, Nie X, Zhang H, Nie L, Yin Y (2019) Multi-view face hallucination using svd and a mapping model. Inf Scie 488:181–189. https://doi.org/10.1016/j.ins.2019.03.026. https://www.sciencedirect.com/science/article/pii/S0020025519302245

Jian M, Lam K (2015) Simultaneous hallucination and recognition of low-resolution faces based on singular value decomposition. IEEE Transactions on Circuits and Systems for Video Technology 25(11):1761–1772. https://doi.org/10.1109/TCSVT.2015.2400772

Jian M, Lam KM (2014) Face-image retrieval based on singular values and potential-field representation. Signal Process 100:9–15. https://doi.org/10.1016/j.sigpro.2014.01.004. https://www.sciencedirect.com/science/article/pii/S0165168414000073

Jian M, Lam KM, Dong J (2014) Facial-feature detection and localization based on a hierarchical scheme. Inf Sci 262:1–14. https://doi.org/10.1016/j.ins.2013.12.001. https://www.sciencedirect.com/science/article/pii/S0020025513008451

Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’14, pp. 1867–1874. IEEE Computer Society, Washington, DC, USA. https://doi.org/10.1109/CVPR.2014.241

King DE (2016) Dlib face detection dataset. http://dlib.net/. Accessed 25 Mar 2020

King DE (2017) High quality face recognition with deep metric learning. http://blog.dlib.net/2017/02/high-quality-face-recognition-with-deep.html . [Online; Acessado em: 01 Oct 2018]

Koestinger M, Wohlhart P, Roth PM, Bischof H (2011) Annotated Facial Landmarks in the Wild: A Large-scale, Real-world Database for Facial Landmark Localization. In: Proceedings First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies

Lahiri U, Bekele E, Dohrmann E, Warren Z, Sarkar N (2013) Design of a virtual reality based adaptive response technology for children with autism. IEEE Trans Neural Syst Rehab Eng 21(1):55–64

Learned-Miller E, Huang GB, RoyChowdhury A, Li H, Hua G (2016) Labeled Faces in the Wild: A Survey, pp. 189–248. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-25958-1_8

Li K, Dai Q, Wang R, Liu Y, Xu F, Wang J (2014) A data-driven approach for facial expression retargeting in video. IEEE Trans Multimed 16(2):299–310. https://doi.org/10.1109/TMM.2013.2293064

Li K, Xu F, Wang J, Dai Q, Liu Y (2012) A data-driven approach for facial expression synthesis in video. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 57–64. https://doi.org/10.1109/CVPR.2012.6247658

Li X, Chang CC, Chang SK (2007) Face alive icon. J Vis Lang Comput 18(4):440–453. https://doi.org/10.1016/j.jvlc.2007.02.008

Li Z, Zhu C, Gold C (2004) Digital terrain modeling: principles and methodology. CRC Press

Liu Z, Shan Y, Zhang Z (2001) Expressive expression mapping with ratio images. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’01, pp. 271–276. ACM, New York, NY, USA. https://doi.org/10.1145/383259.383289

Marčetić D, Soldić M, Ribarić S (2017) Hybrid cascade model for face detection in the wild based on normalized pixel difference and a deep convolutional neural network. In: M. Felsberg, A. Heyden, N. Krüger (eds.) Computer Analysis of Images and Patterns, pp. 379–390. Springer International Publishing, Cham

Masi I, Wu Y, Hassner T, Natarajan P (2018) Deep face recognition: A survey. In: 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp. 471–478. https://doi.org/10.1109/SIBGRAPI.2018.00067

Mena-Chalco J, Junior RC, Velho L (2008) Banco de dados de faces 3d: Impa-face3d. Tech Rep IMPA - RJ. http://app.visgraf.impa.br/database/faces/

Mendi E, Bayrak C (2011) Facial animation framework for web and mobile platforms. In: 2011 IEEE 13th International Conference on e-Health Networking, Appl Serv 52–55. https://doi.org/10.1109/HEALTH.2011.6026785

Mima D, Kubo H, Maejima A, Morishima S (2011) Automatic generation of facial wrinkles according to expression changes. In: SIGGRAPH Asia 2011 Posters, SA ’11, pp. 1:1–1:1. ACM, New York, NY, USA. https://doi.org/10.1145/2073304.2073306

Moghadam SM, Seyyedsalehi SA (2018) Nonlinear analysis and synthesis of video images using deep dynamic bottleneck neural networks for face recognition. Neural Netw. https://doi.org/10.1016/j.neunet.2018.05.016

Ng H, Winkler S (2014) A data-driven approach to cleaning large face datasets. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 343–347. https://doi.org/10.1109/ICIP.2014.7025068

Noh JY, Neumann U (2001) Expression cloning. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’01, pp. 277–288. ACM, New York, NY, USA. https://doi.org/10.1145/383259.383290

Otberdout N, Daoudi M, Kacem A, Ballihi L, Berretti S (2020) Dynamic facial expression generation on hilbert hypersphere with conditional wasserstein generative adversarial nets. IEEE Transactions on Pattern Analysis and Machine Intelligence pp. 1–1. https://doi.org/10.1109/TPAMI.2020.3002500

Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. Br Mach Vis Conf

Pickering MJ, Rüger S (2003) Evaluation of key frame-based retrieval techniques for video. Comput Vis Image Underst 92(2–3):217–235. https://doi.org/10.1016/j.cviu.2003.06.002

Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M (2016) 300 faces in-the-wild challenge: database and results. Image and Vision Computing 47:3–18. https://doi.org/10.1016/j.imavis.2016.01.002. http://www.sciencedirect.com/science/article/pii/S0262885616000147. 300-W, the First Automatic Facial Landmark Detection in-the-Wild Challenge

Seo M, Chen YW (2012) Two-step subspace learning for texture synthesis of facial images. In: 2012 6th International Conference on New Trends in Information Science and Service Science and Data Mining (ISSDM), pp. 483–486

Song L, Lu Z, He R, Sun Z, Tan T (2018) Geometry guided adversarial facial expression synthesis. In: Proceedings of the 26th ACM International Conference on Multimedia, MM ’18, pp. 627–635. ACM, New York, NY, USA. https://doi.org/10.1145/3240508.3240612

Testa RL, Corra CG, Machado-Lima A, Nunes FLS (2019) Synthesis of facial expressions in photographs: Characteristics, approaches, and challenges. ACM Comput Surv 51(6):124:1–124:35. https://doi.org/10.1145/3292652

Testa RL, Machado-Lima A, Nunes FLS (2018) Factors influencing the perception of realism in synthetic facial expressions. In: 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp. 297–304. https://doi.org/10.1109/SIBGRAPI.2018.00045

Thies J, Zollhöfer M, Nieundefinedner M (2019) Deferred neural rendering: Image synthesis using neural textures. ACM Trans Graph 38(4). https://doi.org/10.1145/3306346.3323035

Tulyakov S, Liu MY, Yang X, Kautz J (2018) Mocogan: Decomposing motion and content for video generation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1526–1535. https://doi.org/10.1109/CVPR.2018.00165

Udupa JK, LeBlanc VR, Zhuge Y, Imielinska C, Schmidt H, Currie LM, Hirsch BE, Woodburn J (2006) A framework for evaluating image segmentation algorithms. Comput Med Imaging Graph 30(2):75–87. https://doi.org/10.1016/j.compmedimag.2005.12.001

Vondrick C, Pirsiavash H, Torralba A (2016) Generating videos with scene dynamics. In: D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, R. Garnett (eds.) Advn Neural Inf Process Syst 29

Wang N, Gao X, Tao D, Yang H, Li X (2017) Facial feature point detection: A comprehensive survey. Neurocomputing. https://doi.org/10.1016/j.neucom.2017.05.013

Wang Y, Bilinski P, Bremond F, Dantcheva A (2020) Imaginator: Conditional spatio-temporal gan for video generation. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1149–1158. https://doi.org/10.1109/WACV45572.2020.9093492

Wang Z, Bovik AC (2006) Modern image quality assessment. Synth Lect Image Video Multimed Process 2(1):1–156

Wei W, Tian C, Maybank SJ, Zhang Y (2016) Facial expression transfer method based on frequency analysis. Pattern Recognit 49:115–128. https://doi.org/10.1016/j.patcog.2015.08.004

Xie W, Shen L, Jiang J (2017) A novel transient wrinkle detection algorithm and its application for expression synthesis. IEEE Transactions on Multimedia 19(2), 279–292. https://doi.org/10.1109/TMM.2016.2614429

Xie W, Shen L, Yang M, Jiang J (2018) Facial expression synthesis with direction field preservation based mesh deformation and lighting fitting based wrinkle mapping. Multimed Tools Appl 77(6):7565–7593 . https://doi.org/10.1007/s11042-017-4661-6

Xiong L, Zheng N, Du S, Wu L (2009) Extended facial expression synthesis using statistical appearance model. In: 2009 4th IEEE Conference on Industrial Electronics and Applications, pp. 1582–1587. https://doi.org/10.1109/ICIEA.2009.5138461

Xiong L, Zheng N, Liu, J, Du S, Liu Y (2010) Eye synthesis using the eye curve model. Image Vis Comput 28(3):329–342. https://doi.org/10.1016/j.imavis.2009.06.001

Xiong Z, Wu D, Gu W, Zhang H, Li B, Wang W (2020) Deep discrete attention guided hashing for face image retrieval. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, ICMR ’20, p. 136-144. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3372278.3390683

Yang S, Luo P, Loy CC, Tang X (2016) Wider face: A face detection benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Zhang, H, Patel VM, Riggan BS, Hu S (2017) Generative adversarial network-based synthesis of visible faces from polarimetrie thermal faces. In: 2017 IEEE International Joint Conference on Biometrics (IJCB), pp. 100–107. https://doi.org/10.1109/BTAS.2017.8272687

Zhou Y, Shi BE (2017) Photorealistic facial expression synthesis by the conditional difference adversarial autoencoder. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 370–376. https://doi.org/10.1109/ACII.2017.8273626