Empirical investigation of the temporal relations between speech and facial expressions of emotion

Journal on Multimodal User Interfaces - Tập 3 - Trang 263-270 - 2010
Stéphanie Buisine1, Yun Wang1,2, Ouriel Grynszpan3
1Arts et Métiers ParisTech, LCPI, Paris, France
2LIMSI-CNRS, Orsay Cedex, France
3Hôpital de la Salpêtrière, CNRS USR 3246, Université Pierre et Marie Curie, Paris, France

Tóm tắt

Behavior models implemented within Embodied Conversational Agents (ECAs) require nonverbal communication to be tightly coordinated with speech. In this paper we present an empirical study seeking to explore the influence of the temporal coordination between speech and facial expressions of emotions on the perception of these emotions by users (measuring their performance in this task, the perceived realism of behavior, and user preferences). We generated five different conditions of temporal coordination between facial expression and speech: facial expression displayed before a speech utterance, at the beginning of the utterance, throughout, at the end of, or following the utterance. 23 subjects participated in the experiment and saw these 5 conditions applied to the display of 6 emotions (fear, joy, anger, disgust, surprise and sadness). Subjects recognized emotions most efficiently when facial expressions were displayed at the end of the spoken sentence. However, the combination users viewed as most realistic, preferred over others, was the display of the facial expression throughout speech utterance. We review existing literature to position our work and discuss the relationship between realism and communication performance. We also provide animation guidelines and draw some avenues for future work.

Tài liệu tham khảo

Allen J (1983) Maintaining knowledge about temporal intervals. Commun ACM 26:832–843 Arya A, DiPaola S, Parush A (2009) Perceptually valid facial expressions for character-based applications. Int J Comput Games Technol 2009:1–13 Calder AJ, Rowland D, Young AW, Nimmo-Smith I, Keane J, Perrett DI (2000) Caricaturing facial expressions. Cognition 76:105–146 Cassell J (2000) Nudge nudge wink wink: elements of face-to-face conversation for embodied conversational agents. In: Cassell J, Sullivan J, Prevost S, Churchill E (eds) Embodied conversational agents. MIT Press, Cambridge, pp 1–27 Cassell J, Pelachaud C, Badler N, Steedman M, Achorn B, Becket T, Douville B, Prevost S, Stone M (1994) Animated conversation: rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. In: SIGGRAPH’94. ACM, New York, pp 413–420 Cassell J, Vilhjálmsson H, Bickmore T (2001) BEAT: the Behavior Expression Animation Toolkit. In: SIGGRAPH’01. ACM, New York, pp 477–486 De Rosis F, Pelachaud C, Poggi I, Carofiglio V, De Carolis B (2003) From Greta’s mind to her face: modelling the dynamics of affective states in a conversational embodied agent. Int J Hum-Comput Stud 59:81–118 Devillers L, Vidrascu L, Lamel L (2005) Emotion detection in real-life spoken dialogs recorded in call center. J Neural Netw 18:407–422 Egges A, Kshirsaga S, Magnenat-Thalmann N (2004) Generic personality and emotion simulation for conversational agents. Comput Animat Virtual Worlds 15:1–13 Ekman P (1994) Strong evidence for universals in facial expressions: a reply to Russell’s mistaken critique. Psychol Bull 115:268–287 Ekman P, Friesen WV (1975) Unmasking the face. A guide to recognizing emotions from facial clues. Prentice-Hall, Englewood Cliffs Gratch J, Rickel J, André E, Badler N, Cassell J, Petajan E (2002) Creating interactive virtual humans: some assembly required. IEEE Intell Syst 17:54–63 Gratch J, Marsella S, Egges A, Eliëns A, Isbister K, Paiva A, Rist T, ten Hagen P (2004) Design criteria, techniques and case studies for creating and evaluating interactive experiences for virtual humans. Working group on ECA’s design parameters and aspects, Dagstuhl seminar on Evaluating Embodied Conversational Agents Groom V, Nass C, Chen T, Nielsen A, Scarborough JK, Robles E (2009) Evaluating the effects of behavioral realism in embodied agents. Int J Hum-Comput Stud 67:842–849 Grynszpan O, Nadel J, Constant J, Le Barillier F, Carbonell N, Simonin J, Martin JC, Courgeon M (2009) A new virtual environment paradigm for high functioning autism intended to help attentional disengagement in a social context. In: Virtual rehabilitation international conference, pp 51–58 Isbister K, Doyle P (2004) The blind men and the elephant revisited. In: Ruttkay Z, Pelachaud C (eds) From brows to trust: evaluating embodied conversational agents. Kluwer Academic, Norwell, pp 3–26 Johnson WL, Rickel J, Lester J (2000) Animated pedagogical agents: face-to-face interaction in interactive learning environments. Int J Artif Intell Educ 11:47–78 Krahmer E, Swerts M (2004) More about brows. In: Ruttkay Z, Pelachaud C (eds) From brows to trust: evaluating embodied conversational agents. Kluwer Academic, Norwell, pp 191–216 Krahmer E, Swerts M (2009) Audiovisual prosody—introduction to the special issue. Lang Speech 52:129–133 Krumhuber E, Manstead ASR, Kappas A (2007) Temporal aspects of facial displays in person and expression perception: the effect of smile dynamics, head-tilt, and gender. J Nonverbal Behav 31:39–56 Lester J, Towns S, Callaway C, Voerman J, FitzGerald P (2000) Deictic and emotive communication in animated pedagogical agents. In: Cassell J, Prevost S, Sullivan J, Churchill E (eds) Embodied conversational agents. MIT Press, Cambridge, pp 123–154 Martin JC, Niewiadomski R, Devillers L, Buisine S, Pelachaud C (2006) Multimodal complex emotions: gesture expressivity and blended facial expressions. Int J Humanoid Robot 3:269–292 Messinger DS, Fogel A, Dickson KL (1999) What’s in a smile? Dev Psychol 35:701–708 Nusseck M, Cunningham DW, Wallraven C, Bülthoff HH (2008) The contribution of different facial regions to the recognition of conversational expressions. J Vis 8:1–23 Pelachaud C (2005) Multimodal expressive embodied conversational agents. In: International multimedia conference, pp 683–689 Pelachaud C (2009) Modelling multimodal expression of emotion in a virtual agent. Philos Trans R Soc B 364:3539–3548 Pelachaud C, Badler N, Steedman M (1996) Generating facial expressions for speech. Cogn Sci 20:1–46 Pelachaud C, Carofiglio V, De Carolis B, de Rosis F, Poggi I (2002) Embodied contextual agent in information delivering application. In: AAMAS’2002, pp 758–765 Scherer KR (1980) The functions of nonverbal signs in conversation. In: Giles H, St Clair R (eds) The social and physhological contexts of language. LEA, New York, pp 225–243 Scherer KR (2001) Appraisal considered as a process of multi-level sequential checking. In: Scherer KR, Schorr A, Johnstone T (eds) Appraisal processes in emotion: theory, methods, research. Oxford University Press, New York, pp 92–120 Tanguy E, Willis P, Bryson J (2007) Emotions as durative dynamic state for action selection. In: IJCAI’07: International joint conference on artificial intelligence, pp 1537–1542