Phân loại cảm xúc sợ hãi trong lời nói qua các dấu hiệu âm học và hành vi

Multimedia Tools and Applications - Tập 78 - Trang 2345-2366 - 2018
Shin-ae Yoon1, Guiyoung Son1, Soonil Kwon1
1Department of Software, College of Software and Convergence Technology, Sejong University, Seoul, Republic of Korea

Tóm tắt

Phân loại cảm xúc trong lời nói dựa trên máy móc đã trở thành một yêu cầu cho các tương tác giữa người và máy tính một cách tự nhiên và quen thuộc. Do hệ thống nhận diện giọng nói cảm xúc sử dụng giọng nói của một người để phát hiện trạng thái cảm xúc của họ một cách tự phát và thực hiện các hành động thích hợp tiếp theo, chúng có thể được sử dụng rộng rãi cho nhiều lý do khác nhau trong các trung tâm cuộc gọi và dịch vụ truyền thông dựa trên cảm xúc. Các hệ thống nhận diện giọng nói cảm xúc chủ yếu được phát triển thông qua dữ liệu âm học cảm xúc. Mặc dù có một số cơ sở dữ liệu âm học cảm xúc có sẵn cho các hệ thống nhận diện cảm xúc ở các quốc gia khác, hiện chưa có dữ liệu tình huống thực tế nào liên quan đến "cảm xúc sợ hãi". Do đó, trong nghiên cứu này, chúng tôi đã thu thập các bản ghi dữ liệu âm học đại diện cho các tình huống khẩn cấp và đáng sợ thực tế từ một trung tâm cuộc gọi khẩn cấp. Để phân loại cảm xúc của những người gọi một cách chính xác hơn, chúng tôi cũng đã bao gồm đặc trưng hành vi bổ sung "lời xen vào" mà có thể được phân loại như một loại sự không lưu loát phát sinh do rối loạn nhận thức được quan sát trong lời nói tự phát khi người nói cảm thấy quá cảm xúc. Chúng tôi đã sử dụng Máy Vector Hỗ trợ (SVM) với đặc trưng lời xen vào, cũng như các đặc trưng âm học được sử dụng thông thường (tức là, biến động tần số cơ bản, biến động cường độ giọng nói và Hệ số Cepstral Tần số Mel; MFCCs) để xác định dữ liệu âm học thuộc về loại cảm xúc nào. Kết quả nghiên cứu của chúng tôi cho thấy MFCC là đặc trưng âm học tốt nhất cho phân loại lời nói sợ hãi tự phát. Bên cạnh đó, chúng tôi đã chứng minh tính hợp lệ của các đặc trưng hành vi như là một tiêu chí quan trọng để cải thiện phân loại cảm xúc.

Từ khóa

#phân loại cảm xúc #cảm xúc sợ hãi #dữ liệu âm học #đặc trưng hành vi #giọng nói tự phát

Tài liệu tham khảo

Barrett LF (1998) Discrete emotions or dimensions? The role of valence focus and arousal focus. Cognit Emot 12(4):579–599 Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2000) Desperately seeking emotions or: actors, wizards, and human beings. In: ISCA tutorial and research workshop (ITRW) on speech and emotion Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167 Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005) A database of German emotional speech. In: Interspeech, pp 1517–1520 Chavhan Y, Dhore M, Yesaware P (2010) Speech emotion recognition using support vector machine. Int J Comput Appl 1(20):6–9 Corley M, Stewart OW (2008) Hesitation disfluencies in spontaneous speech: the meaning of um. Lang Linguist Compass 2(4):589–602 Davison GC, Vogel RS, Coffman SG (1997) Think-aloud approaches to cognitive assessment and the articulated thoughts in simulated situations paradigm. J Consult Clin Psychol 65(6):950–958 Devillers L, Vasilescu I, Vidrascu L (2004) Anger versus fear detection in recorded conversations. In: Proceedings of speech prosody, pp 205–208 Devillers L, Vidrascu L, Lamel L (2005) Challenges in real-life emotion annotation and machine learning based detection. Neural Netw 18(4):407–422. https://doi.org/10.1016/j.neunet.2005.03.007 Dibble JL, Wisner AM, Dobbins L, Cacal M, Taniguchi E, Peyton A, van Raalte L, Kubulins A (2015) Hesitation to share bad news: by-product of verbal message planning or functional communication behavior? Commun Res 42(2):213–236 El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn 44(3):572–587 Fontaine JR, Scherer KR, Roesch EB, Ellsworth PC (2007) The world of emotions is not two-dimensional. Psychol Sci 18(12):1050–1057 Forbes-Riley K, Litman DJ (2004) Predicting emotion in spoken dialogue from multiple knowledge sources. In: HLT-NAACL. Citeseer, pp 201–208 Galanis D, Karabetsos S, Koutsombogera M, Papageorgiou H, Esposito A, Riviello M-T (2013) Classification of emotional speech units in call centre interactions. In: Cognitive infocommunications (CogInfoCom), 2013 IEEE 4th international conference on. IEEE, pp 403–406 Goberman AM, Hughes S, Haydock T (2011) Acoustic characteristics of public speaking: anxiety and practice effects. Speech Comm 53(6):867–876 Hamann S (2012) Mapping discrete and dimensional emotions onto the brain: controversies and consensus. Trends Cogn Sci 16(9):458–466. https://doi.org/10.1016/j.tics.2012.07.006 Iliou T, Anagnostopoulos C-N (2009) Statistical evaluation of speech features for emotion recognition. In: Digital telecommunications, 2009. ICDT'09. Fourth International Conference on. IEEE, pp 121–126 Izard CE, Libero DZ, Putnam P, Haynes OM (1993) Stability of emotion experiences and their relations to traits of personality. J Pers Soc Psychol 64(5):847 Juslin PN, Laukka P (2003) Communication of emotions in vocal expression and music performance: different channels, same code? Psychol Bull 129(5):770–814. https://doi.org/10.1037/0033-2909.129.5.770 Kao Y-h, Lee L-s (2006) Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In: InterSpeech Laukka P, Juslin P, Bresin R (2005) A dimensional approach to vocal expression of emotion. Cognit Emot 19(5):633–653 Lee CM, Narayanan SS (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303 Lee H, Kim E, Lee M (2003) A validation study of Korea positive and negative affect schedule: the PANAS scales. Korean J Clin Psychol 22(4):935–946 Lee F-M, Li L-H, Huang R-Y (2008) Recognizing low/high anger in speech for call centers. In: Proceedings of 7th international conference on signal processing, robotics and automation. World Scientific and Engineering Academy and Society (WSEAS), University of Cambridge, UK, pp 171–176 Lindsey AE, Greene JO, Parker RG, Sassi M (1995) Effects of advance message formulation on message encoding: evidence of cognitively based hesitation in the production of multiple-goal messages. Commun Q 43(3):320–331 Lindström A, Villing J, Larsson S, Seward A, Åberg N, Holtelius C (2008) The effect of cognitive load on disfluencies during in-vehicle spoken dialogue. In: INTERSPEECH, pp 1196–1199 Liscombe J, Riccardi G, Hakkani-Tür DZ (2005) Using context to improve emotion detection in spoken dialog systems. In: Interspeech, pp 1845–1848 Luengo I, Navas E, Hernáez I, Sánchez J (2005) Automatic emotion recognition using prosodic parameters. In: Interspeech, pp 493–496 Lugger M, Yang B (2007) The relevance of voice quality features in speaker independent emotion recognition. In: Acoustics, speech and signal processing, 2007. ICASSP 2007. IEEE International Conference on. IEEE, pp IV-17–IV-20 Mao X, Chen L, Fu L (2009) Multi-level speech emotion recognition based on HMM and ANN. In: Computer science and information engineering, 2009 WRI world congress on. IEEE, pp 225–229 Mauss IB, Robinson MD (2009) Measures of emotion: a review. Cognit Emot 23(2):209–237. https://doi.org/10.1080/02699930802204677 Mehrabian A, Russell JA (1974) An approach to environmental psychology. the MIT Press, Cambridge Metze F, Englert R, Bub U, Burkhardt F, Stegmann J (2009) Getting closer: tailored human–computer speech dialog. Univ Access Inf Soc 8(2):97–108 Morrison D, Wang RL, De Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Comm 49(2):98–112 Narayanan S (2002) Towards modeling user behavior in human-machine interaction: effect of errors and emotions. In: Proc. ISLE workshop on dialogue tagging for multi-modal human computer interaction Narayanan S, Georgiou PG (2013) Behavioral signal processing: deriving human behavioral informatics from speech and language: computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, commerce, education, and beyond. Proc IEEE Inst Electr Electron Eng 101(5):1203–1233. https://doi.org/10.1109/JPROC.2012.2236291 Neiberg D, Elenius K (2008) Automatic recognition of anger in spontaneous speech. In: INTERSPEECH, pp 2755–2758 Neiberg D, Elenius K, Laskowski K (2006) Emotion recognition in spontaneous speech using GMMs. In: Interspeech, pp 809–812 Ostir GV, Markides KS, Black SA, Goodwin JS (2000) Emotional well-being predicts subsequent functional independence and survival. J Am Geriatr Soc 48(5):473–478 Pan Y, Shen P, Shen L (2012) Speech emotion recognition using support vector machine. IJSH International Journal of Smart Home 6(2):101–108 Panksepp J (1989) The neurobiology of emotions: of animal brains and human feelings Pao T-L, Chen Y-T, Yeh J-H, Li P-J (2006) Mandarin emotional speech recognition based on SVM and NN. In: Pattern recognition, 2006. ICPR 2006. 18th International Conference on. IEEE, pp 1096–1100 Petrushin V (1999) Emotion in speech: recognition and application to call centers. In: Proceedings of artificial neural networks in engineering Pfister T (2010) Emotion detection from speech. 2010 Picard RW, Vyzas E, Healey J (2001) Toward machine emotional intelligence: analysis of affective physiological state. IEEE Trans Pattern Anal Mach Intell 23(10):1175–1191 Plutchik R (1980) A general psychoevolutionary theory of emotion. Theories of Emotion 1(3-31):4 Polzehl T, Schmitt A, Metze F, Wagner M (2011) Anger recognition in speech using acoustic and linguistic cues. Speech Comm 53(9-10):1198–1209 Rao KS, Koolagudi SG, Vempada RR (2013) Emotion recognition from speech using global and local prosodic features. Int J Speech Technol 16(2):143–160 Russell JA (2003) Core affect and the psychological construction of emotion. Psychol Rev 110(1):145–172 Rv B (1984) The characteristics and recognizability of vocal expression of emotions. Walter de Gruyter, Inc., The Netherlands Sahidullah M, Saha G (2012) Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Comm 54(4):543–565 Salovey P, Kokkonen M, Lopes PN, Mayer JD (2004) Emotional intelligence: what do we know? In: Feelings and emotions: the Amsterdam symposium, Jun, 2001, Amsterdam, Netherlands. Cambridge University Press Sato N, Obuchi Y (2007) Emotion recognition using mel-frequency cepstral coefficients. IMT 2(3):835–848 Schuller B, Rigoll G, Lang M (2003) Hidden Markov model-based speech emotion recognition. In: Multimedia and expo, 2003. ICME'03. Proceedings. 2003 international conference on. IEEE, pp I–401 Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: Acoustics, speech, and signal processing, 2004. Proceedings. (ICASSP'04). IEEE international conference on. IEEE, pp I–577 Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Comm 53(9-10):1062–1087 Tahon M, Delaborde A, Devillers L (2011) Real-life emotion detection from speech in human-robot interaction: experiments across diverse corpora with child and adult voices. In: Interspeech Utane AS, Nalbalwar S (2013) Emotion recognition through speech using Gaussian mixture model and hidden Markov model. IJARCSSE 3(4) Ververidis D, Kotropoulos C, Pitas I (2004) Automatic emotional speech classification. In: Acoustics, speech, and signal processing, 2004. Proceedings. (ICASSP'04). IEEE international conference on. IEEE, pp I–593 Vidrascu L, Devillers L (2005) Annotation and detection of blended emotions in real human-human dialogs recorded in a call center. In: Multimedia and expo, 2005. ICME 2005. IEEE international conference on. IEEE, p 4 pp Vidrascu L, Devillers L (2005) Detection of real-life emotions in call centers. In: INTERSPEECH, vol 10, pp 1841–1844 Watson D, Clark LA, Tellegen A (1988) Development and validation of brief measures of positive and negative affect: the PANAS scales. J Pers Soc Psychol 54(6):1063–1070 Wingate ME (1984) Fluency, disfluency, dysfluency, and stuttering. J Fluen Disord 9(2):163–168 Xiao Z, Dellandrea E, Dou W, Chen L (2005) Features extraction and selection for emotional speech classification. In: Advanced video and signal based surveillance, 2005. AVSS 2005. IEEE conference on. IEEE, pp 411–416 Yik MS, Russell JA, Barrett LF (1999) Structure of self-reported current affect: integration and beyond. J Pers Soc Psychol 77(3):600 Zhang S (2008) Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In: International symposium on neural networks. Springer, pp 457–464 Zhu A, Luo Q (2007) Study on speech emotion recognition system in E-learning. In: International conference on human-computer interaction. Springer, pp 544–552