Perceptual audio features for emotion detection

Mehmet Sezgin1, Bilge Günsel1, Güneş Karabulut Kurt1
1Multimedia Signal Processing and Pattern Recognition Lab., Department of Electronics and Communications, Istanbul Technical University, Istanbul, Turkey

Tóm tắt

Abstract In this article, we propose a new set of acoustic features for automatic emotion recognition from audio. The features are based on the perceptual quality metrics that are given in perceptual evaluation of audio quality known as ITU BS.1387 recommendation. Starting from the outer and middle ear models of the auditory system, we base our features on the masked perceptual loudness which defines relatively objective criteria for emotion detection. The features computed in critical bands based on the reference concept include the partial loudness of the emotional difference, emotional difference-to-perceptual mask ratio, measures of alterations of temporal envelopes, measures of harmonics of the emotional difference, the occurrence probability of emotional blocks, and perceptual bandwidth. A soft-majority voting decision rule that strengthens the conventional majority voting is proposed to assess the classifier outputs. Compared to the state-of-the-art systems including Munich Open-Source Emotion and Affect Recognition Toolkit, Hidden Markov Toolkit, and Generalized Discriminant Analysis, it is shown that the emotion recognition rates are improved between 7-16% for EMO-DB and 7-11% in VAM for "all" and "valence" tasks.

Từ khóa


Tài liệu tham khảo

Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor J: Emotion recognition in human-computer interaction. IEEE Signal Process Mag 2001, 18(1):32-80.

Ayadia ME, Kamelb MS, Karrayb F: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit 2011, 44(3):572-587.

Lee CM, Narayanan SS: Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 2005, 13: 293-303.

Gunes H, Schuller B, Pantic M, Cowie R: Emotion representation, analysis and synthesis in continuous space: a survey. Proc of the IEEE Int Workshop on EmoSPACE, in Conjunction with the IEEE FG 2011, CA, USA 2011, 827-834.

Schuller B, Vlasenko B, Eyben F, Rigoll G, Wendemuth A: Acoustic emotion recognition: a benchmark comparison of performances. Proc of the IEEE Automatic Speech Recognition and Understanding Workshop, Italy 2009, 552-557.

Ververidis D, Kotropoulos C: Emotional speech recognition: resources, features, and methods. Speech Commun 2006, 48(9):1162-1181.

Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P: The HTK Book (v3.4). Cambridge University Press, Cambridge; 2006.

Eyben F, Wollmer M, Schuller B: openEAR--introducing the munich open-source emotion and affect recognition toolkit. IEEE Proc of the 4th International HUMAINE Association Conference on Affective Computing and Intelligent Interaction, Amsterdam 2009, 576-581.

Stuhlsatz A, Meyer C, Eyben F, Zielke T, Meier G, Schuller B: Deep neural networks for acoustic emotion recognition: raising the benchmarks. Proc of the IEEE International Conference on Acoustics Speech and Signal Processing, Prague 2011, 5688-5691.

Lugger M, Yang B: Psychological motivated multi-stage emotion classification exploiting voice quality features. In Speech Recognition, Technologies and Applications. Edited by: France Mihelic, Janez Zibert. I-Tech Education and Publishing, Vienna, Austria; 2008:395-410.

Yang B, Lugger M: Emotion recognition from speech signals using new harmony features. Signal Process 2010, 90(5):1415-1423.

Kim HG, Moreau N, Sikora T: MPEG-7 Audio and Beyond. John Wiley & Sons Ltd., England; 2005.

Sezgin C, Gunsel B, Kurt GK: A novel perceptual feature set for audio emotion recognition. Proc of the IEEE Int Workshop on EmoSPACE, in Conjunction with the IEEE FG 2011, CA, USA 2011, 780-785.

Schuller B, Vlasenko B, Eyben F, Wollmer M, Stuhlsatz A, Wendemuth A, Rigoll G: Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans Affect Comput 2010, 1(2):1-13.

Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B: A database of German emotional speech. Proc of the INTERSPEECH, Portugal 2005, 1517-1520.

Grimm M, Kroschel K, Narayanan S: The Vera am Mittag German audio-visual emotional speech database. Proc of the IEEE International Conference on Multimedia and Expo, Germany 2008, 737-742.

Ekman P: An argument for basic emotions. Cognit Emotion 1992, 6: 169-200.

Schlosberg H: Three dimensions of emotions. Psychol Rev 1954, 61: 81-88.

Russell JA: A circumplex model of affect. J Personal Soc Psychol 1980, 39: 1161-1178.

Nwe T, Foo S, De Silva L: Speech emotion recognition using hidden Markov models. Speech Commun 2003, 41: 603-623.

Zhou G, Hansen JHL, Kaiser JF: Nonlinear feature based classification of speech under stress. IEEE Trans Speech Audio Process 2001, 9(3):201-216.

Chen L, Huang T, Miyasato T, Nakatsu R: Multimodal human emotion/expression recognition. Proc of the IEEE Automatic Face and Gesture Recognition, Japan 1998, 366-371.

Pudil P, Ferri F, Novovicova J, Kittler J: Floating search method for feature selection with nonmonotonic criterion functions. Proc of the International Conference on Pattern Recognition, Israel 1994, 279-283.

International Telecommunications Union Recommendation BS.1387-1, Method for objective measurements of perceived audio quality 2000.

Thiede T, Treurniet WC, Bitto R, Schmidmer C, Sporer T, Beerends JG, Colomes C, Keyhl M, Stoll H, Brandenburg K: PEAQ--the ITU standard for objective measurement of perceived audio quality. J Audio Eng Soc 2000, 48: 3-29.

Busso C, Lee S, Narayanan S: Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Trans Audio Speech Lang Process 2009, 17(4):582-596.

Murphy PJ, McGuigan KG, Walsh M, Colreavy M: Investigation of a glottal related harmonics-to-noise ratio and spectral tilt as indicators of glottal noise in synthesized and human voice signals. Acoust Soc Am 2008, 123(3):1642-1652.

Chang CC, Lin CJ: LibSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2001, 2: 27:1-27:27.

Witten IH, Frank E: Data Mining: Practical Machine Learning Tools and Techniques With Java Implementations. Morgan Kaufman, San Francisco; 2000.

André E, Rehm M, Minker W, Bühler D: Endowing spoken language dialogue systems with emotional intelligence. Proc of the Affective Dialogue Systems, Germany 2004, 178-187.