Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo
Phân loại cảm xúc sợ hãi trong lời nói qua các dấu hiệu âm học và hành vi
Tóm tắt
Phân loại cảm xúc trong lời nói dựa trên máy móc đã trở thành một yêu cầu cho các tương tác giữa người và máy tính một cách tự nhiên và quen thuộc. Do hệ thống nhận diện giọng nói cảm xúc sử dụng giọng nói của một người để phát hiện trạng thái cảm xúc của họ một cách tự phát và thực hiện các hành động thích hợp tiếp theo, chúng có thể được sử dụng rộng rãi cho nhiều lý do khác nhau trong các trung tâm cuộc gọi và dịch vụ truyền thông dựa trên cảm xúc. Các hệ thống nhận diện giọng nói cảm xúc chủ yếu được phát triển thông qua dữ liệu âm học cảm xúc. Mặc dù có một số cơ sở dữ liệu âm học cảm xúc có sẵn cho các hệ thống nhận diện cảm xúc ở các quốc gia khác, hiện chưa có dữ liệu tình huống thực tế nào liên quan đến "cảm xúc sợ hãi". Do đó, trong nghiên cứu này, chúng tôi đã thu thập các bản ghi dữ liệu âm học đại diện cho các tình huống khẩn cấp và đáng sợ thực tế từ một trung tâm cuộc gọi khẩn cấp. Để phân loại cảm xúc của những người gọi một cách chính xác hơn, chúng tôi cũng đã bao gồm đặc trưng hành vi bổ sung "lời xen vào" mà có thể được phân loại như một loại sự không lưu loát phát sinh do rối loạn nhận thức được quan sát trong lời nói tự phát khi người nói cảm thấy quá cảm xúc. Chúng tôi đã sử dụng Máy Vector Hỗ trợ (SVM) với đặc trưng lời xen vào, cũng như các đặc trưng âm học được sử dụng thông thường (tức là, biến động tần số cơ bản, biến động cường độ giọng nói và Hệ số Cepstral Tần số Mel; MFCCs) để xác định dữ liệu âm học thuộc về loại cảm xúc nào. Kết quả nghiên cứu của chúng tôi cho thấy MFCC là đặc trưng âm học tốt nhất cho phân loại lời nói sợ hãi tự phát. Bên cạnh đó, chúng tôi đã chứng minh tính hợp lệ của các đặc trưng hành vi như là một tiêu chí quan trọng để cải thiện phân loại cảm xúc.
Từ khóa
#phân loại cảm xúc #cảm xúc sợ hãi #dữ liệu âm học #đặc trưng hành vi #giọng nói tự phátTài liệu tham khảo
Barrett LF (1998) Discrete emotions or dimensions? The role of valence focus and arousal focus. Cognit Emot 12(4):579–599
Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2000) Desperately seeking emotions or: actors, wizards, and human beings. In: ISCA tutorial and research workshop (ITRW) on speech and emotion
Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005) A database of German emotional speech. In: Interspeech, pp 1517–1520
Chavhan Y, Dhore M, Yesaware P (2010) Speech emotion recognition using support vector machine. Int J Comput Appl 1(20):6–9
Corley M, Stewart OW (2008) Hesitation disfluencies in spontaneous speech: the meaning of um. Lang Linguist Compass 2(4):589–602
Davison GC, Vogel RS, Coffman SG (1997) Think-aloud approaches to cognitive assessment and the articulated thoughts in simulated situations paradigm. J Consult Clin Psychol 65(6):950–958
Devillers L, Vasilescu I, Vidrascu L (2004) Anger versus fear detection in recorded conversations. In: Proceedings of speech prosody, pp 205–208
Devillers L, Vidrascu L, Lamel L (2005) Challenges in real-life emotion annotation and machine learning based detection. Neural Netw 18(4):407–422. https://doi.org/10.1016/j.neunet.2005.03.007
Dibble JL, Wisner AM, Dobbins L, Cacal M, Taniguchi E, Peyton A, van Raalte L, Kubulins A (2015) Hesitation to share bad news: by-product of verbal message planning or functional communication behavior? Commun Res 42(2):213–236
El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn 44(3):572–587
Fontaine JR, Scherer KR, Roesch EB, Ellsworth PC (2007) The world of emotions is not two-dimensional. Psychol Sci 18(12):1050–1057
Forbes-Riley K, Litman DJ (2004) Predicting emotion in spoken dialogue from multiple knowledge sources. In: HLT-NAACL. Citeseer, pp 201–208
Galanis D, Karabetsos S, Koutsombogera M, Papageorgiou H, Esposito A, Riviello M-T (2013) Classification of emotional speech units in call centre interactions. In: Cognitive infocommunications (CogInfoCom), 2013 IEEE 4th international conference on. IEEE, pp 403–406
Goberman AM, Hughes S, Haydock T (2011) Acoustic characteristics of public speaking: anxiety and practice effects. Speech Comm 53(6):867–876
Hamann S (2012) Mapping discrete and dimensional emotions onto the brain: controversies and consensus. Trends Cogn Sci 16(9):458–466. https://doi.org/10.1016/j.tics.2012.07.006
Iliou T, Anagnostopoulos C-N (2009) Statistical evaluation of speech features for emotion recognition. In: Digital telecommunications, 2009. ICDT'09. Fourth International Conference on. IEEE, pp 121–126
Izard CE, Libero DZ, Putnam P, Haynes OM (1993) Stability of emotion experiences and their relations to traits of personality. J Pers Soc Psychol 64(5):847
Juslin PN, Laukka P (2003) Communication of emotions in vocal expression and music performance: different channels, same code? Psychol Bull 129(5):770–814. https://doi.org/10.1037/0033-2909.129.5.770
Kao Y-h, Lee L-s (2006) Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In: InterSpeech
Laukka P, Juslin P, Bresin R (2005) A dimensional approach to vocal expression of emotion. Cognit Emot 19(5):633–653
Lee CM, Narayanan SS (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303
Lee H, Kim E, Lee M (2003) A validation study of Korea positive and negative affect schedule: the PANAS scales. Korean J Clin Psychol 22(4):935–946
Lee F-M, Li L-H, Huang R-Y (2008) Recognizing low/high anger in speech for call centers. In: Proceedings of 7th international conference on signal processing, robotics and automation. World Scientific and Engineering Academy and Society (WSEAS), University of Cambridge, UK, pp 171–176
Lindsey AE, Greene JO, Parker RG, Sassi M (1995) Effects of advance message formulation on message encoding: evidence of cognitively based hesitation in the production of multiple-goal messages. Commun Q 43(3):320–331
Lindström A, Villing J, Larsson S, Seward A, Åberg N, Holtelius C (2008) The effect of cognitive load on disfluencies during in-vehicle spoken dialogue. In: INTERSPEECH, pp 1196–1199
Liscombe J, Riccardi G, Hakkani-Tür DZ (2005) Using context to improve emotion detection in spoken dialog systems. In: Interspeech, pp 1845–1848
Luengo I, Navas E, Hernáez I, Sánchez J (2005) Automatic emotion recognition using prosodic parameters. In: Interspeech, pp 493–496
Lugger M, Yang B (2007) The relevance of voice quality features in speaker independent emotion recognition. In: Acoustics, speech and signal processing, 2007. ICASSP 2007. IEEE International Conference on. IEEE, pp IV-17–IV-20
Mao X, Chen L, Fu L (2009) Multi-level speech emotion recognition based on HMM and ANN. In: Computer science and information engineering, 2009 WRI world congress on. IEEE, pp 225–229
Mauss IB, Robinson MD (2009) Measures of emotion: a review. Cognit Emot 23(2):209–237. https://doi.org/10.1080/02699930802204677
Mehrabian A, Russell JA (1974) An approach to environmental psychology. the MIT Press, Cambridge
Metze F, Englert R, Bub U, Burkhardt F, Stegmann J (2009) Getting closer: tailored human–computer speech dialog. Univ Access Inf Soc 8(2):97–108
Morrison D, Wang RL, De Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Comm 49(2):98–112
Narayanan S (2002) Towards modeling user behavior in human-machine interaction: effect of errors and emotions. In: Proc. ISLE workshop on dialogue tagging for multi-modal human computer interaction
Narayanan S, Georgiou PG (2013) Behavioral signal processing: deriving human behavioral informatics from speech and language: computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, commerce, education, and beyond. Proc IEEE Inst Electr Electron Eng 101(5):1203–1233. https://doi.org/10.1109/JPROC.2012.2236291
Neiberg D, Elenius K (2008) Automatic recognition of anger in spontaneous speech. In: INTERSPEECH, pp 2755–2758
Neiberg D, Elenius K, Laskowski K (2006) Emotion recognition in spontaneous speech using GMMs. In: Interspeech, pp 809–812
Ostir GV, Markides KS, Black SA, Goodwin JS (2000) Emotional well-being predicts subsequent functional independence and survival. J Am Geriatr Soc 48(5):473–478
Pan Y, Shen P, Shen L (2012) Speech emotion recognition using support vector machine. IJSH International Journal of Smart Home 6(2):101–108
Panksepp J (1989) The neurobiology of emotions: of animal brains and human feelings
Pao T-L, Chen Y-T, Yeh J-H, Li P-J (2006) Mandarin emotional speech recognition based on SVM and NN. In: Pattern recognition, 2006. ICPR 2006. 18th International Conference on. IEEE, pp 1096–1100
Petrushin V (1999) Emotion in speech: recognition and application to call centers. In: Proceedings of artificial neural networks in engineering
Pfister T (2010) Emotion detection from speech. 2010
Picard RW, Vyzas E, Healey J (2001) Toward machine emotional intelligence: analysis of affective physiological state. IEEE Trans Pattern Anal Mach Intell 23(10):1175–1191
Plutchik R (1980) A general psychoevolutionary theory of emotion. Theories of Emotion 1(3-31):4
Polzehl T, Schmitt A, Metze F, Wagner M (2011) Anger recognition in speech using acoustic and linguistic cues. Speech Comm 53(9-10):1198–1209
Rao KS, Koolagudi SG, Vempada RR (2013) Emotion recognition from speech using global and local prosodic features. Int J Speech Technol 16(2):143–160
Russell JA (2003) Core affect and the psychological construction of emotion. Psychol Rev 110(1):145–172
Rv B (1984) The characteristics and recognizability of vocal expression of emotions. Walter de Gruyter, Inc., The Netherlands
Sahidullah M, Saha G (2012) Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Comm 54(4):543–565
Salovey P, Kokkonen M, Lopes PN, Mayer JD (2004) Emotional intelligence: what do we know? In: Feelings and emotions: the Amsterdam symposium, Jun, 2001, Amsterdam, Netherlands. Cambridge University Press
Sato N, Obuchi Y (2007) Emotion recognition using mel-frequency cepstral coefficients. IMT 2(3):835–848
Schuller B, Rigoll G, Lang M (2003) Hidden Markov model-based speech emotion recognition. In: Multimedia and expo, 2003. ICME'03. Proceedings. 2003 international conference on. IEEE, pp I–401
Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: Acoustics, speech, and signal processing, 2004. Proceedings. (ICASSP'04). IEEE international conference on. IEEE, pp I–577
Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Comm 53(9-10):1062–1087
Tahon M, Delaborde A, Devillers L (2011) Real-life emotion detection from speech in human-robot interaction: experiments across diverse corpora with child and adult voices. In: Interspeech
Utane AS, Nalbalwar S (2013) Emotion recognition through speech using Gaussian mixture model and hidden Markov model. IJARCSSE 3(4)
Ververidis D, Kotropoulos C, Pitas I (2004) Automatic emotional speech classification. In: Acoustics, speech, and signal processing, 2004. Proceedings. (ICASSP'04). IEEE international conference on. IEEE, pp I–593
Vidrascu L, Devillers L (2005) Annotation and detection of blended emotions in real human-human dialogs recorded in a call center. In: Multimedia and expo, 2005. ICME 2005. IEEE international conference on. IEEE, p 4 pp
Vidrascu L, Devillers L (2005) Detection of real-life emotions in call centers. In: INTERSPEECH, vol 10, pp 1841–1844
Watson D, Clark LA, Tellegen A (1988) Development and validation of brief measures of positive and negative affect: the PANAS scales. J Pers Soc Psychol 54(6):1063–1070
Wingate ME (1984) Fluency, disfluency, dysfluency, and stuttering. J Fluen Disord 9(2):163–168
Xiao Z, Dellandrea E, Dou W, Chen L (2005) Features extraction and selection for emotional speech classification. In: Advanced video and signal based surveillance, 2005. AVSS 2005. IEEE conference on. IEEE, pp 411–416
Yik MS, Russell JA, Barrett LF (1999) Structure of self-reported current affect: integration and beyond. J Pers Soc Psychol 77(3):600
Zhang S (2008) Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In: International symposium on neural networks. Springer, pp 457–464
Zhu A, Luo Q (2007) Study on speech emotion recognition system in E-learning. In: International conference on human-computer interaction. Springer, pp 544–552