Perceptual audio features for emotion detection

EURASIP Journal on Audio, Speech, and Music Processing - Tập 2012 Số 1 - 2012

Mehmet Sezgin¹, Bilge Günsel¹, Güneş Karabulut Kurt¹

¹Multimedia Signal Processing and Pattern Recognition Lab., Department of Electronics and Communications, Istanbul Technical University, Istanbul, Turkey

Tóm tắt

Abstract In this article, we propose a new set of acoustic features for automatic emotion recognition from audio. The features are based on the perceptual quality metrics that are given in perceptual evaluation of audio quality known as ITU BS.1387 recommendation. Starting from the outer and middle ear models of the auditory system, we base our features on the masked perceptual loudness which defines relatively objective criteria for emotion detection. The features computed in critical bands based on the reference concept include the partial loudness of the emotional difference, emotional difference-to-perceptual mask ratio, measures of alterations of temporal envelopes, measures of harmonics of the emotional difference, the occurrence probability of emotional blocks, and perceptual bandwidth. A soft-majority voting decision rule that strengthens the conventional majority voting is proposed to assess the classifier outputs. Compared to the state-of-the-art systems including Munich Open-Source Emotion and Affect Recognition Toolkit, Hidden Markov Toolkit, and Generalized Discriminant Analysis, it is shown that the emotion recognition rates are improved between 7-16% for EMO-DB and 7-11% in VAM for "all" and "valence" tasks.

Từ khóa

Tài liệu tham khảo

Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor J: Emotion recognition in human-computer interaction. IEEE Signal Process Mag 2001, 18(1):32-80.

Ayadia ME, Kamelb MS, Karrayb F: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit 2011, 44(3):572-587.

Lee CM, Narayanan SS: Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 2005, 13: 293-303.

Gunes H, Schuller B, Pantic M, Cowie R: Emotion representation, analysis and synthesis in continuous space: a survey. Proc of the IEEE Int Workshop on EmoSPACE, in Conjunction with the IEEE FG 2011, CA, USA 2011, 827-834.

Schuller B, Vlasenko B, Eyben F, Rigoll G, Wendemuth A: Acoustic emotion recognition: a benchmark comparison of performances. Proc of the IEEE Automatic Speech Recognition and Understanding Workshop, Italy 2009, 552-557.

Ververidis D, Kotropoulos C: Emotional speech recognition: resources, features, and methods. Speech Commun 2006, 48(9):1162-1181.

Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P: The HTK Book (v3.4). Cambridge University Press, Cambridge; 2006.

Eyben F, Wollmer M, Schuller B: openEAR--introducing the munich open-source emotion and affect recognition toolkit. IEEE Proc of the 4th International HUMAINE Association Conference on Affective Computing and Intelligent Interaction, Amsterdam 2009, 576-581.

Stuhlsatz A, Meyer C, Eyben F, Zielke T, Meier G, Schuller B: Deep neural networks for acoustic emotion recognition: raising the benchmarks. Proc of the IEEE International Conference on Acoustics Speech and Signal Processing, Prague 2011, 5688-5691.

Lugger M, Yang B: Psychological motivated multi-stage emotion classification exploiting voice quality features. In Speech Recognition, Technologies and Applications. Edited by: France Mihelic, Janez Zibert. I-Tech Education and Publishing, Vienna, Austria; 2008:395-410.

Yang B, Lugger M: Emotion recognition from speech signals using new harmony features. Signal Process 2010, 90(5):1415-1423.

Kim HG, Moreau N, Sikora T: MPEG-7 Audio and Beyond. John Wiley & Sons Ltd., England; 2005.

Sezgin C, Gunsel B, Kurt GK: A novel perceptual feature set for audio emotion recognition. Proc of the IEEE Int Workshop on EmoSPACE, in Conjunction with the IEEE FG 2011, CA, USA 2011, 780-785.

Schuller B, Vlasenko B, Eyben F, Wollmer M, Stuhlsatz A, Wendemuth A, Rigoll G: Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans Affect Comput 2010, 1(2):1-13.

Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B: A database of German emotional speech. Proc of the INTERSPEECH, Portugal 2005, 1517-1520.

Grimm M, Kroschel K, Narayanan S: The Vera am Mittag German audio-visual emotional speech database. Proc of the IEEE International Conference on Multimedia and Expo, Germany 2008, 737-742.

Ekman P: An argument for basic emotions. Cognit Emotion 1992, 6: 169-200.

Schlosberg H: Three dimensions of emotions. Psychol Rev 1954, 61: 81-88.

Russell JA: A circumplex model of affect. J Personal Soc Psychol 1980, 39: 1161-1178.

Nwe T, Foo S, De Silva L: Speech emotion recognition using hidden Markov models. Speech Commun 2003, 41: 603-623.

Zhou G, Hansen JHL, Kaiser JF: Nonlinear feature based classification of speech under stress. IEEE Trans Speech Audio Process 2001, 9(3):201-216.

Chen L, Huang T, Miyasato T, Nakatsu R: Multimodal human emotion/expression recognition. Proc of the IEEE Automatic Face and Gesture Recognition, Japan 1998, 366-371.

Pudil P, Ferri F, Novovicova J, Kittler J: Floating search method for feature selection with nonmonotonic criterion functions. Proc of the International Conference on Pattern Recognition, Israel 1994, 279-283.

International Telecommunications Union Recommendation BS.1387-1, Method for objective measurements of perceived audio quality 2000.

Thiede T, Treurniet WC, Bitto R, Schmidmer C, Sporer T, Beerends JG, Colomes C, Keyhl M, Stoll H, Brandenburg K: PEAQ--the ITU standard for objective measurement of perceived audio quality. J Audio Eng Soc 2000, 48: 3-29.

Busso C, Lee S, Narayanan S: Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Trans Audio Speech Lang Process 2009, 17(4):582-596.

Murphy PJ, McGuigan KG, Walsh M, Colreavy M: Investigation of a glottal related harmonics-to-noise ratio and spectral tilt as indicators of glottal noise in synthesized and human voice signals. Acoust Soc Am 2008, 123(3):1642-1652.

Chang CC, Lin CJ: LibSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2001, 2: 27:1-27:27.

Witten IH, Frank E: Data Mining: Practical Machine Learning Tools and Techniques With Java Implementations. Morgan Kaufman, San Francisco; 2000.

André E, Rehm M, Minker W, Bühler D: Endowing spoken language dialogue systems with emotional intelligence. Proc of the Affective Dialogue Systems, Germany 2004, 178-187.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích ảnh hưởng của các bài báo, công bố khoa học Việt Nam và Quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ SciBase

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Hệ thống hội thảo khoa học Việt Nam

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA

Thông tin liên hệ & hỗ trợ

Đơn vị chủ quản, phát triển và vận hành: Công ty Cổ phần Metis

Địa chỉ liên hệ: 26A Lê Đức Thọ, Phường Từ Liêm, Thành phố Hà Nội

Số giấy chứng nhận ĐKKD: 0109293202 cấp ngày 03/08/2020 tại Sở Kế hoạch và Đầu tư thành phố Hà Nội

Người quản lý và chịu trách nhiệm nội dung: Nguyễn Ngọc Sơn

Hotline: 0566.685.688

Email: [email protected]