Speech-based recognition of self-reported and observed emotion in a dimensional space

Speech Communication - Tập 54 - Trang 1049-1063 - 2012
Khiet P. Truong1, David A. van Leeuwen2, Franciska M.G. de Jong1
1University of Twente, Human Media Interaction, P.O. Box 217, 7500 AE Enschede, The Netherlands
2Radboud University Nijmegen, Centre for Language and Speech Technology, P.O. Box 9103, 6500 HD Nijmegen, The Netherlands

Tài liệu tham khảo

Ang, J., Dhillon, R., Krupski, A., Shriberg, E., Stolcke, A., 2002. Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP 2002), pp. 2037–2040. Auberge, V., Audibert, N., Rilliard, A., 2006. Auto-annotation: an alternative method to label expressive corpora. In: Proc. Fifth Internat. Conf. on Language Resources and Evaluation (LREC 2006). Banse, 1996, Acoustic profiles in vocal emotion expression, J. Pers. Soc. Psychol., 70, 614, 10.1037/0022-3514.70.3.614 Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E., 2000. Desperately seeking emotions: actors, wizards, and human beings. In: Cowie, R., Douglas-Cowie, E., Schröder, M. (Eds.), In: Proc. ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research, pp. 195–200. Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V., 2006. Combining efforts for improving automatic classification of emotional user states. In: Language Technologies (IS-LTC), pp. 240–245. Biersack, S., Kempe, V., 2005. Tracing vocal expression of emotion along the speech chain: do listeners perceive what speakers feel? In: Proc. ISCA Workshop on Plasticity in Speech Perception (PSP2005), pp. 211–214. Boersma, P., Weenink, D., 2009. Praat: doing phonetics by computer (Version 5.1.07). [Computer Program]. <http://www.praat.org/> Retrieved 16.06.09. Busso, C., Narayanan, S.S., 2008. The expression and perception of emotions: comparing assessments of self versus others. In: Proc. Interspeech 2008, pp. 257–260. Chang, C.-C., Lin, C.-J., 2001. LIBSVM: a library for support vector machines. <http://www.csie.ntu.edu.tw/˜cjlin/libsvm>. Cowie, 2003, Describing the emotional states that are expressed in speech, Speech Commun., 40, 5, 10.1016/S0167-6393(02)00071-7 Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., Schroeder, M., 2000. FEELTRACE: an instrument for recording perceived emotion in real time. In: Proc. ISCA ITRW on Speech and Emotion, pp. 19–24. Cowie, 2001, Emotion recognition in human-computer interaction, IEEE Signal Process. Mag., 18, 32, 10.1109/79.911197 Dellaert, F., Polzin, T., Waibel, A., 1996. Recognizing emotion in speech. In: Proc. Internat. Conf. on Spoken Language Processing (ICSLP 1996), pp. 1970–1973. Den Uyl, M.J., Van Kuilenburg, H., 2005. The facereader: online facial expression recognition. In: Proc. Measuring Behavior, pp. 589–590. Devillers, L., Lamel, L., Vasilescu, I., 2003. Emotion detection in task-oriented spoken dialogues. In: Proc. IEEE Internat. Conf. on Multimedia and Expo (ICME’03), pp. 549–552. Devillers, 2005, Challenges in real-life emotion annotation and machine learning based detection, Neural Networks, 18, 407, 10.1016/j.neunet.2005.03.007 Douglas-Cowie, E., Devillers, L., Martin, J.-C., Cowie, R., Davvidou, S., Abrillian, S., Cox, C., 2005. Multimodal databases of everyday emotion: facing up to complexity. In: Proc. Interspeech 2005, pp. 813–816. Ekman, P., 1972. Universals and cultural differences in facial expressions of emotion. In: Cole, J. (Ed.), Nebraska Symposium on Motivation, pp. 207–283. Ekman, 1975 Eyben, 2010, On-line emotion recognition in a 3-d activation-valence-time continuum using acoustic and linguistic cues, J. Multimodal User Interf., 3, 7, 10.1007/s12193-009-0032-6 Giannakopoulos, T., Pikrakis, A., Theodoridis, S., 2009. A dimensional approach to emotion recognition of speech from movies. In: Proc. IEEE Internat. Conf. on Acoustics, Speech and Signal Processing (ICASSP’09), pp. 65–68. Grimm, 2007, Primitives-based evaluation and estimation of emotions in speech, Speech Commun., 49, 787, 10.1016/j.specom.2007.01.010 Grimm, M., Kroschel, K., Narayanan, S., 2007b. Support Vector Regression for automatic recognition of spontaneous emotions in speech. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2007), pp. 1085–1088. Gunes, H., Schuller, B., Pantic, M., Cowie, R., 2011. Emotion representation, analysis and synthesis in continuous space: a survey. In: Proc. IEEE Internat. Conf. on Automatic Face & Gesture Recognition and Workshops (FG2011), pp. 827–834. Hanjalic, 2005, Affective video content representation and modeling, IEEE Trans. Multimedia, 7, 143, 10.1109/TMM.2004.840618 Joachims, T., 1998. Text categorization with support vector machines: learning with many relevant features. In: Proc. 10th European Conf. on Machine Learning (ECML-98), pp. 137–142. Johnstone, 2005, Affective speech elicited with a computer game, Emotion, 5, 513, 10.1037/1528-3542.5.4.513 Kim, J., André, E., Rehm, M., Vogt, T., Wagner, J., 2005. Integrating information from speech and physiological signals to achieve emotional sensitivity. In: Proc. Interspeech, pp. 809–812. Kwon, O.-W., Chan, K., Hao, J., Lee, T.-W., 2003. Emotion recognition by speech signals. In: Proc. Eurospeech, pp. 125–128. Lang, 1995, The emotion probe, Amer. Psychol., 50, 371, 10.1037/0003-066X.50.5.372 Lazarro, N., 2004. Why We Play Games: Four Keys to More Emotion Without Story. Lee, C.M., Narayanan, S., Pieraccini, R., 2002. Classifying emotions in human-machine spoken dialogs. In: Proc. IEEE Internat. Conf. on Multimedia and Expo (ICME ’02), pp. 737–740. Liscombe, J., Venditti, J., Hirschberg, J., 2003. Classifying subject ratings of emotional speech using acoustic features. In: Proc. Eurospeech, pp. 725–728. Mower, E., Mataric, M.J., Narayanan, S.S., 2009. Evaluating evaluators: a case study in understanding the benefits and pitfalls of multi-evaluator modeling. In: Proc. Interspeech 2009, pp. 1583–1586. Nicolaou, M.A., Gunes, H., Pantic, M., 2010. Automatic segmentation of spontaneous data using dimensional labels from multiple coders. In: Proc. Internat. Workship on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, pp. 43–48. Nicolaou, 2011, Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space, IEEE Trans. Affect. Comput., 2, 92, 10.1109/T-AFFC.2011.9 Nwe, 2003, Speech Commun., 41, 603, 10.1016/S0167-6393(03)00099-2 Petrushin, V.A., 1999. Emotion in speech: recognition and application to call centers. In: Proc. 1999 Conf. on Artificial Neural Networks in Engineering (ANNIE’99). Polzin, T., Waibel, A., 1998. Detecting emotions in speech. In: Proc. Cooperative Multimodal Communication (CMC’98). Ravaja, 2006, Spatial presence and emotions during video game playing: does it matter with whom you play?, Presence: Teleoper. Virtual Environ., 15, 381, 10.1162/pres.15.4.381 Russell, 1980, A circumplex model of affect, J. Pers. Soc. Psychol., 39, 1161, 10.1037/h0077714 Salton, 1988, Term-weighting approaches in automatic text retrieval, Inform. Process. Manage., 24, 513, 10.1016/0306-4573(88)90021-0 Scherer, K.R., 2010. The component process model: architecture for a comprehensive computational model of emergent emotion. In: Scherer, K.R., Bänziger, T., Roesch, E. (Eds.), Blueprint for Affective Computing: A Sourcebook, pp. 47–70. Schlosberg, 1954, Psychol. Rev., 61, 81, 10.1037/h0054570 Schuller, B., Rigoll, G., Lang, M., 2003. Hidden Markov model-based speech emotion recognition. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2003), pp. 1–4. Smola, 2004, A tutorial on support vector regression, Stat. Comput., 14, 199, 10.1023/B:STCO.0000035301.49549.88 Tato, R., Santos, R., Kompe, R., Pardo, J.M., 2002. Emotional space improves emotion recognition. In: Proc. Internat. Conf. on Spoken Language Processing (ICSLP 2002), pp. 2029–2032. Truong, K.P., Neerincx, M.A., Van Leeuwen, D.A., 2008. Assessing agreement of observer- and self-annotations in spontaneous multimodal emotion data. In: Proc. Interspeech 2008, pp. 381–321. Truong, K.P., Raaijmakers, S., 2008. Automatic recognition of spontaneous emotions in speech using acoustic and lexical features. In: Proc. Fifth Joint Workshop on Machine Learning and Multimodal Interaction (MLMI 2008), pp. 161–172. Truong, K.P., Van Leeuwen, D.A., Neerincx, M.A., De Jong, F.M.G., 2009. Arousal and valence prediction in spontaneous emotional speech: felt versus perceived emotion. In: Proc. Interspeech, pp. 2027–2030. Vapnik, 2002 Ververidis, D., Kotropoulos, C., 2005. Emotional speech classification using gaussian mixture models and the sequential floating forward selection algorithm. In: Proc. IEEE Internat. Conf. on Multimedia and Expo (ICME 2005), pp. 1500 –1503. Ververidis, 2006, Emotional speech recognition: resources, features, and methods, Speech Commun., 48, 1162, 10.1016/j.specom.2006.04.003 Wang, N., Marsella, S., 2006. Introducing EVG: an emotion evoking game. In: Proc. Internat. Conf. on Interactive Virtual Agents (IVA 2006), pp. 282 – 291. Williams, 1972, Emotions and speech: some acoustical correlates, J. Acoust. Soc. Amer., 52, 1238, 10.1121/1.1913238 Wöllmer, M., Eyben, F., Reiter, S., Schuller, B., Cox, C., Douglas-Cowie, E., Cowie, R., 2008. Abandoning emotion classes – towards continuous emotion recognition with modeling of long-range dependencies. In: Proc. Interspeech, pp. 597–600. Wöllmer, M., Eyben, F., Schuller, B., Douglas-Cowie, E., Cowie, R., 2009. Data-driven clustering in emotional space for affect recognition using discriminatively trained lstm networks. In: Proc. Interspeech, pp. 1595–1598. Wundt, 1874 Yildirim, S., Lee, C.M., Lee, S., Potamianos, A., Narayanan, S.S., 2005. Detecting politeness and frustration state of a child in a conversational computer game. In: Proc. Interspeech 2005, pp. 2209–2212. Yu, C., Aoki, P., Woodruff, A., 2004. Detecting user engagement in everyday conversations. In: Proc. Interspeech, pp. 1329–1332. Zeng, Z., Zhang, Z., Pianfetti, B., Tu, J., Huang, T.S., 2005. Audio-visual affect recognition in activation-evaluation space. In: Proc. IEEE Internat. Conf. on Multimedia and Expo (ICME’05), pp. 828–831.