Analysis of significant dialog events in realistic human–computer interaction

Journal on Multimodal User Interfaces - Tập 8 Số 1 - Trang 75-86 - 2014
Dmytro Prylipko1, Dietmar Rösner2, Ingo Siegert1, Stephan Günther2, Rafael Friesen2, Matthias Haase3, Bogdan Vlasenko1, Andreas Wendemuth1
1IIKT & CBBS, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
2IWS & CBBS, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
3Department of Psychosomatic Medicine and Psychotherapy, Otto-von-Guericke University, Magdeburg, Germany

Tóm tắt

Từ khóa


Tài liệu tham khảo

Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2003) How to find trouble in communication. Speech Commun 40(1–2):117–143

Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L (2011) Whodunnit—searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang 25(1):4–28

Boersma P (2001) Praat, a system for doing phonetics by computer. Glot Int 5(9/10):341–345

Callejas Z, López-Cózar R (2008) Influence of contextual information in emotion annotation for spoken dialogue systems. Speech Commun 50:416–433

Campbell N (2007) On the use of nonverbal speech sounds in human communication. Cost 2012 workshop (Vietri), LNAI. Springer, Berlin, Heidelberg, pp 117–128

Caridakis G, Karpouzis K, Wallace M, Kessous L, Amir N (2010) Multimodal user’s affective state analysis in naturalistic interaction. J Multimodal User Interfaces 3(1):49–66

Cohn JF, Schmidt K (2004) The timing of facial motion in posed and spontaneous smiles. Int J Wavelets Multiresolut Inf Process 2(2):121–132

Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor J (2001) Emotion recognition in human–computer interaction. IEEE Signal Process Mag 18(1):32–80

Douglas-Cowie E, Devillers L, Martin JC, Cowie R, Savvidou S, Abrilian S, Cox C (2005) Multimodal databases of everyday emotion: facing up to complexity. In: Proceedings of Interspeech’05, pp 813–816

Edlund J, Gustafson J, Heldner M, Hjalmarsson A (2008) Towards human-like spoken dialogue systems. Speech Commun 50(8):630–645

Frommer J, Rösner D, Haase M, Lange J, Friesen R, Otto M (2012) Detection and avoidance of failures in dialogues—wizard of Oz experiment operator’s manual. Pabst Science Publishers, Germany

Fukuda S, Matsuura Y (1996) Understanding of emotional feelings in sound. Trans Jpn Soc Mech Eng Part C 62(598):2293–2298

Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18

Jimenez-Fernandez A, Del Pozo F, Munoz C, Zoreda JL (1987) Pattern recognition in the vocal expression of emotional categories. In: Proceedings of the 25th annual Conference of the IEEE Engineering in Medicine and Biology Society, pp 2090–2091

Jurafsky D, Martin JH (2009) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, USA. http://www.cs.colorado.edu/%7Emartin/slp.html

Kapoor A, Burleson W, Picard RW (2007) Automatic prediction of frustration. Int J Hum Comput Stud 65(8):724–736

Krauss RM, Chen Y, Chawla P (1996) Nonverbal behavior and nonverbal communication: what do conversational hand gestures tell us? Adv Exp Soc Psychol 28:389–450

Lee CM, Narayanan S (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303

Prylipko D, Schuller B, Wendemuth A (2012) Fine-tuning HMMs for nonverbal vocalizations in spontaneous speech: a multicorpus perspective. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, pp 4625–4628

Rösner D, Friesen R, Otto M, Lange J, Haase M, Frommer J (2011) Human–computer Interaction. Towards mobile and intelligent interaction environments. Intentionality in interacting with companion systems: an empirical approach. Springer, Berlin, pp 593–602

Rösner D, Frommer J, Andrich R, Friesen R, Haase M, Kunze M, Lange J, Otto M (2012) LAST MINUTE: a novel corpus to support emotion, sentiment and social signal processing. In: Conference on Language Resources and Evaluation, LREC’12 Abstracts

Rösner D, Kunze M, Otto M, Frommer J (2012) Linguistic analyses of the LAST MINUTE corpus. In: Proceedings of KONVENS’12, ÖGAI, pp 145–154

Scherer KR, Ceschi G (1997) Lost luggage: a field study of emotion-antecedent appraisal. Motiv Emot 21:211–235

Scherer S, Glodek M, Layher G, Schels M, Schmidt M, Brosch T, Tschechne S, Schwenker F, Neumann H, Palm G (2012) A generic framework for the inference of user states in human computer interaction. J Multimodal User Interfaces 6(3–4):117–141

Schmidt T, Schütte W (2010) Folker: An annotation tool for efficient transcription of natural, multi-party interaction. In: Proceedings of LREC’10, pp 2091–2096

Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun 53(9–10):1062–1087

Selting M, et al (2009) Gesprächsanalytisches Transkriptionssystem 2 (GAT 2)

Siegert I, Böck R, Philippou-Hübner D, Vlasenko B, Wendemuth A (2011) Appropriate emotional labeling of non-acted speech using basic emotions, Geneva emotion wheel and self assessment ,anikins. In: Proceedings of ICME’11

Suwa M, Sugie N, Fujimora K (1978) A preliminary note on pattern recognition of human emotional expression. In: Proceedings of the IEEE International Conference on Pattern Recognition, pp 408–410

Vlasenko B, Prylipko D, Philippou-Hübner D, Wendemuth A (2011) Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions. In: Proceedings of Interspeech’11, pp 1577–1580

Vlasenko B, Prylipko D, Böck R, Wendemuth A (2014) Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Comput Speech Lang (Article in press)

Walker M, Langkilde I, Wright J, Gorin A, Litman D (2000) Learning to predict problematic situations in a spoken dialogue system: experiments with how may I help you? In: Proceedings of NAACL’00, pp 210–217

Wilks Y (2010) Close engagements with artificial companions: key social, psychological, ethical and design issues. John Benjamins, Amsterdam

Williams CE, Stevens KN (1972) Emotions and speech: some acoustical correlates. J Acoust Soc Am 52(4B):1238–1250

Wöllmer M, Eyben F, Reiter S, Schuller B, Cox C, Douglas-Cowie E, Cowie R (2008) Abandoning emotion classes—towards continuous emotion recognition with modelling of long-range dependencies. In: Proceedings of Interspeech’08, pp 597–600

Wolters M, Georgila K, Moore JD, MacPherson SE (2009) Being old doesn’t mean acting old: how older users interact with spoken dialog systems. ACM Trans Access Comput 2(1):1–39

Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK book (for HTK Version 3.4). Cambridge University Press, Cambridge

Zeng Z, Tu J, Liu M, Huang T, Pianfetti B, Roth D, Levinson S (2007) Audio-visual affect recognition. IEEE Trans Multimed 9(2):424–428

Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58