Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge
Tài liệu tham khảo
Ai, H., Litman, D., Forbes-Riley, K., Rotaru, M., Tetreault, J., Purandare, A., 2006. Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In: Proc. Interspeech, Pittsburgh, PA, USA, pp. 797–800.
Al-Hames, M., Rigoll, G., 2006. Reduced complexity and scaling for asynchronous HMMs in a bimodal input fusion application. In: Proc. ICASSP, Toulouse, France, pp. 757–760.
Altun, 2009, Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection, Expert Systems Appl., 36, 8197, 10.1016/j.eswa.2008.10.005
Ang, J., Dhillon, R., Shriberg, E., Stolcke, A., 2002. Prosody-based automatic detection of annoyance and frustration in human–computer dialog. In: Proc. Interspeech, Denver, CO, USA, pp. 2037–2040.
Armstrong, 2007, Significance tests harm progress in forecasting, Internat. J. Forecast., 23, 321, 10.1016/j.ijforecast.2007.03.004
Arunachalam, S., Gould, D., Anderson, E., Byrd, D., Narayanan, S., 2001. Politeness and frustration language in child–machine interactions. In: Proc. Eurospeech, Aalborg, Denmark, pp. 2675–2678.
Atal, 1971, Speech analysis and synthesis by linear prediction of the speech wave, J. Acoust. Soc. Amer., 50, 637, 10.1121/1.1912679
Athanaselis, 2005, ASR for emotional speech: clarifying the issues and enhancing performance, Neural Networks, 18, 437, 10.1016/j.neunet.2005.03.008
Ayadi, M.M.H.E., Kamel, M.S., Karray, F., 2007. Speech emotion recognition using gaussian mixture vector autoregressive models. In: Proc. ICASSP, Honolulu, HY, pp. 957–960.
Baggia, P., Burnett, D.C., Carter, J., Dahl, D.A., McCobb, G., Raggett, D., 2007. EMMA: Extensible MultiModal Annotation markup language.
Barra-Chicote, R., Fernandez, F., Lutfi, S., Lucas-Cuesta, J.M., Macias-Guarasa, J., Montero, J.M., San-Segundo, R., Pardo, J.M., 2009. Acoustic emotion recognition using dynamic bayesian networks and multi-space distributions. In: Proc. Interspeech, Brighton, pp. 336–339.
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V., 2007. The impact of F0 extraction errors on the classification of prominence and emotion. In: Proc. ICPhS, Saarbrücken, Germany, pp. 2201–2204.
Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E., 2000. Desperately seeking emotions: actors, wizards, and human beings. In: Proc. ISCA Workshop on Speech and Emotion, Newcastle, Northern Ireland, pp. 195–200.
Batliner, A., Buckow, J., Huber, R., Warnke, V., Nöth, E., Niemann, H., 2001. Boiling down prosody for the classification of boundaries and accents in German and English. In: Proc. Eurospeech, Aalborg, Denmark, pp. 2781–2784.
Batliner, 2003, How to find trouble in communication, Speech Comm., 40, 117, 10.1016/S0167-6393(02)00079-1
Batliner, A., Zeissler, V., Frank, C., Adelhardt, J., Shi, R.P., Nöth, E., 2003b. We are not amused – but how do you know? User states in a multi-modal dialogue system. In: Proc. Interspeech, Geneva, Switzerland, pp. 733–736.
Batliner, A., Hacker, C., Steidl, S., Nöth, E., Haas, J., 2004. From emotion to interaction: lessons from real human–machine-dialogues. In: Proc. Tutorial and Research Workshop on Affective Dialogue Systems, Kloster Irsee, Germany, pp. 1–12.
Batliner, A., Steidl, S., Hacker, C., Nöth, E., Niemann, H., 2005. Tales of tuning – prototyping for automatic classification of emotional user states. In: Proc. Interspeech, Lisbon, Portugal, pp. 489–492.
Batliner, A., Burkhardt, F., van Ballegooy, M., Nöth, E., 2006a. A taxonomy of applications that utilize emotional awareness. In: Proc. IS-LTC 2006, Ljubliana, Slovenia, pp. 246–250.
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V., 2006b. Combining efforts for improving automatic classification of emotional user states. In: Proc. IS-LTC 2006, Ljubliana, Slovenia, pp. 240–245.
Batliner, A., Steidl, S., Nöth, E., 2007a. Laryngealizations and emotions: How many Babushkas? In: Proc. Internat. Workshop on Paralinguistic Speech – between Models and Data (ParaLing’07), Saarbrücken, Germany, pp. 17–22.
Batliner, A., Schuller, B., Schaeffler, S., Steidl, S., 2008a. Mothers, adults, children, pets — towards the acoustics of intimacy. In: Proc. ICASSP 2008, Las Vegas, NV, pp. 4497–4500.
Batliner, 2008, Private emotions vs. social interaction — a data-driven approach towards analysing emotions in speech, User Model. User-Adapted Interact., 18, 175, 10.1007/s11257-007-9039-4
Batliner, A., Seppi, D., Steidl, S., Schuller, B., 2010. Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach. Advances in Human–Computer Interaction, Vol. 2010. Article ID 782802, 15 pages.
Batliner, 2011, Whodunnit – searching for the most important feature types signalling emotional user states in speech, Comput. Speech Lang., 25, 4, 10.1016/j.csl.2009.12.003
Bellman, 1961
Bengio, S., 2003. An asynchronous hidden markov model for audio-visual speech recognition. Advances in NIPS 15.
Bengio, 1995, An input output HMM architecture, Adv. Neural Inform. Process. Systems, 7, 427
Boda, P.P., 2004. Multimodal integration in a wider sense. In: Proc. COLING 2004 Satellite Workshop on Robust and Adaptive Information Processing for Mobile Speech Interfaces, Geneva, Switzerland, pp. 22–30.
Boersma, 1993, Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound, Proc. Inst. Phonetic Sci. (Univ. Amsterdam), 17, 97
Boersma, P., Weenink, D., 2005. Praat: doing phonetics by computer (version 4.3.14). <http://www.praat.org/>.
Bogert, B., Healy, M., Tukey, J., 1963. The quefrency analysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking. In: Rosenblatt, M. (Ed.), Symposium on Time Series Analysis. John Wiley & Sons, New York, pp. 209–243.
Bozkurt, E., Erzin, E., Erdem, Ç.E., Erdem, A.T., 2009. Improving automatic emotion recognition from speech signals. In: Proc. Interspeech, Brighton, pp. 324–327.
Breese, J., Ball, G., 1998. Modeling emotional state and personality for conversational agents. Technical Report MS-TR-98-41, Microsoft.
Brendel, M., Zaccarelli, R., Schuller, B., Devillers, L., 2010. Towards measuring similarity between emotional corpora. In: Proc. 3rd ELR0A Internat. Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect, Valetta, pp. 58–64.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B., 2005. A database of german emotional speech. In: Proc. Interspeech, Lisbon, Portugal, pp. 1517–1520.
Burkhardt, F., van Ballegooy, M., Engelbrecht, K.-P., Polzehl, T., Stegmann, J., 2009. Emotion detection in dialog systems: applications, strategies and challenges. In: Proc. ACII, Amsterdam, Netherlands, pp. 1–6.
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S., 2004. Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proc. ICMI ’04: Proc. 6th Internat. Conf. on Multimodal interfaces, New York, USA, pp. 205–211.
Campbell, N., Kashioka, H., Ohara, R., 2005. No laughing matter. In: Proc. Interspeech, Lisbon, Portugal, pp. 465–468.
Chen, L.S., Tao, H., Huang, T.S., Miyasato, T., Nakatsu, R., 1998. Emotion recognition from audiovisual information. In: Proc. IEEE Workshop on Multimedia Signal Processing, pp. 83–88.
Cheng, Y.-M., Kuo, Y.-S., Yeh, J.-H., Chen, Y.-T., Chien, C., 2006. Using recognition of emotions in speech to better understand brand slogans. In: Proc. IEEE 8th Workshop on Multimedia Signal Processing, Victoria, BC, pp. 238–242.
Cheveigne, 2002, Yin: a fundamental frequency estimator for speech and music, J. Acoust. Soc. Amer., 111, 1917, 10.1121/1.1458024
Chuang, Z.-J., Wu, C.-H., 2004. Emotion recognition using acoustic features and textual content. In: Proc. ICME, Taipei, Taiwan, pp. 53–56.
Cohen, J., 1988. Statistical Power Analysis for the Behavioural Sciences, second ed. Erlbaum, Hillsdale, NJ.
Cowie, R., Douglas-cowie, E., Apolloni, B., Taylor, J., Romano, A., Fellenz, W., 1999. What a neural net needs to know about emotion words. In: Mastorakis, N. (Ed.), Computational Intelligence and Applications. Word Scientific Engineering Society, Society Press, pp. 109–114.
Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., Schröder, M., 2000. Feeltrace: an instrument for recording perceived emotion in real time. In: Proc. ISCA Workshop on Speech and Emotion, Newcastle, Northern Ireland, pp. 19–24.
Cowie, 2005, Beyond emotion archetypes: databases for emotion modelling using neural networks, Neural Networks, 18, 3388, 10.1016/j.neunet.2005.03.002
Cowie, 2010, Emotions: concepts and definitions
Daubechies, 1990, The wavelet transform, time–frequency localization and signal analysis, TransIT, 36, 961
Davis, 1980, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. Acoust. Speech Signal Process., 29, 917
de Gelder, 2000, The perception of emotions by ear and eye, Cognition Emotion, 14, 289, 10.1080/026999300378824
de Gelder, 1999, The combined perception of emotion from voice and face: early interaction revealed by human electric brain responses, Neurosci. Lett., 260, 133, 10.1016/S0304-3940(98)00963-X
Dellaert, F., Polzin, T., Waibel, A., 1996. Recognizing emotion in speech. In: Proc. ICSLP, Philadelphia, PA, USA, pp. 1970–1973.
Devillers, L., Vasilescu, I., Lamel, L., 2003. Emotion detection in task-oriented spoken dialogs. In: Proc. ICME 2003, IEEE, Multimedia Human–Machine Interface and Interaction, Baltimore, MD, USA, pp. 549–552.
Devillers, 2007, Real-life emotion recognition in speech, Vol. 4441/2007, 34
Devillers, L., Abrilian, S., Martin, J.-C., 2005a. Representing real-life emotions in audiovisual data with non basic emotional patterns and context features. In: Proc. ACII, Beijing, China, pp. 519–526.
Devillers, 2005, Challenges in real-life emotion annotation and machine learning based detection, Neural Networks, 18, 407, 10.1016/j.neunet.2005.03.007
Ding, H., Qian, B., Li, Y., Tang, Z., 2006. A method combining lpc-based cepstrum and harmonic product spectrum for pitch detection. In: Proc. 2006 Internat. Conf. on Intelligent Information Hiding and Multimedia, IEEE, pp. 537–540.
Dumouchel, P., Dehak, N., Attabi, Y., Dehak, R., Boufaden, N., 2009. Cepstral and long-term features for emotion recognition. In: Proc. Interspeech, Brighton, pp. 344–347.
Elliott, C., 1992. The affective reasoner: a process model of emotions in a multi-agent system. Ph.D. Thesis, Dissertation, Northwestern University.
Engberg, I.S., Hansen, A.V., Andersen, O., Dalsgaard, P., 1997. Design, recording and verification of a Danish emotional speech database. In: Proc. Eurospeech, Rhodes, Greece, pp. 1695–1698.
Erickson, 2004, Exploratory study of some acoustic and articulatory characteristics of sad speech, Phonetica, 63, 1, 10.1159/000091404
Eyben, F., Wöllmer, M., Schuller, B., 2009. openEAR – Introducing the Munich Open-Source Emotion and Affect Recognition Toolkit. In: Proc. ACII, Amsterdam, Netherlands, pp. 576–581.
Eyben, F., Batliner, A., Schuller, B., Seppi, D., Steidl, S., 2010a. Cross-corpus classification of realistic emotions some pilot experiments. In: Proc. 3rd Internat. Workshop on EMOTION (Satellite of LREC): Corpora for Research on Emotion and Affect, Valetta, pp. 77–82.
Eyben, 2010, On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues, J. Multimodal User Interfaces, 3, 7, 10.1007/s12193-009-0032-6
Eyben, F., Wöllmer, M., Schuller, B., 2010c. openSMILE – the munich versatile and fast open-source audio feature Extractor. In: Proc. ACM Multimedia, Florence, Italy, pp. 1459–1462.
Eysenck, 1960, The concept of statistical significance and the controversy about one-tailed tests, Psychol. Rev., 67, 269, 10.1037/h0048412
Fattah, S.A., Zhu, W.P., Ahmad, M.O., 2008. A cepstral domain algorithm for formant frequency estimation from noise-corrupted speech. In: Internat. Conf. on Neural Networks and Signal Processing 2008, Zhenjiang, China, pp. 114–119.
Fehr, 1984, Concept of emotion viewed from a prototype perspective, J. Exp. Psychol.: Gen., 113, 464, 10.1037/0096-3445.113.3.464
Fei, Z., Huang, X., Wu, L., 2006. Mining the relation between sentiment expression and target using dependency of words. In: Proc. 20th Pacific Asia Conf. on Language, Information and Computation (PACLIC20), Wuhan, China, pp. 257–264.
Ferguson, 2009, An effect size primer: a guide for clinicians and researchers, Prof. Psychol.: Res. Practice, 40, 532, 10.1037/a0015808
Fernandez, 2003, Modeling drivers’ speech under stress, Speech Comm., 40, 145, 10.1016/S0167-6393(02)00080-8
Fillenbaum, 1966, Memory for gist: some relevant variables, Lang. Speech, 9, 217, 10.1177/002383096600900403
Fiscus, J., 1997. A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER). In: Proc. ASRU, Santa Barbara, CA, USA, pp. 347–352.
Fleiss, 1969, Large sample standard errors of kappa and weighted kappa, Psychol. Bull., 72, 323, 10.1037/h0028106
Forbes-Riley, K., Litman, D., 2004. Predicting emotion in spoken dialogue fromm multiple knowledge sources. In: Proc. Human Language Technology Conf. of the North American Chap. of the Assoc. for Computational Linguistics, Boston, MA, USA, no pagination.
Frick, 1985, Communicating emotion: the role of prosodic features, Psychol. Bull., 97, 412, 10.1037/0033-2909.97.3.412
Fukunaga, 1990
Gaussier, E., Goutte, C., 2005. Relation between PLSA and NMF and implications. In: Proc. 28th Internat. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-05), Salvador, Brazil, pp. 601–602.
Gigerenzer, 2004, Mindless statistics, J. Socio-Econ., 33, 587, 10.1016/j.socec.2004.09.033
Godbole, N., Srinivasaiah, M., Skiena, S., 2007. Large-scale sentiment analysis for news and blogs. In: Proc. Internat. Conf. on Weblogs and Social Media (ICWSM), Boulder, CO, no pagination.
Goertzel, B., Silverman, K., Hartley, C., Bugaj, S., Ross, M., 2000. The baby webmind project. In: Proc. Annual Conf. of The Society for the Study of Artificial Intelligence and the Simulation of Behaviour (AISB), no pagination.
Grimm, 2007, On the necessity and feasibility of detecting a driver’s emotional state while driving, 126
Grimm, M., Kroschel, K., Narayanan, S., 2008. The Vera am Mittag German audio–visual emotional speech database. In: Proc. IEEE Internat. Conf. on Multimedia and Expo (ICME), Hannover, Germany, pp. 865–868.
Gunes, H., Piccardi, M., 2005. Affect recognition from face and body: early fusion vs. late fusion. In: IEEE Internat. Conf. Systems, Man and Cybernetics, Vol. 4, pp. 3437–3443.
Hall, M.A., 1998. Correlation-based feature selection for machine learning. Ph.D. Thesis, Hamilton, NZ: Waikato University, Department of Computer Science.
Hansen, J., Bou-Ghazale, S., 1997. Getting started with susas: a speech under simulated and actual stress database. In: Proc. EUROSPEECH-97, Vol. 4, Rhodes, Greece, pp. 1743–1746.
Harnad, 1987
Hermansky, 1990, Perceptual linear predictive (plp) analysis for speech, J. Acoust. Soc. Amer. (JASA), 87, 1738, 10.1121/1.399423
Hess, 1996, Prosodic modules for speech recognition and understanding in verbmobil, 363
Hirschberg, J., Liscombe, J., Venditti, J., 2003. Experiments in emotional speech. In: Proc. ISCA and IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo, Japan, pp. 1–7.
Hui, L., Dai, B.-Q., Wei, L., 2006. A pitch detection algorithm based on amdf and acf. In: Proc. ICASSP, Toulouse, France, p. I.
Hyvärinen, 2001
Inanoglu, Z., Caneel, R., 2005. Emotive alert: HMM-based emotion detection in voicemail messages. In: Proc. 10th Internat. Conf. on Intelligent User Interfaces, San Diego, CA, USA, pp. 251–253.
Joachims, T., 1998. Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (Eds.), Proc. ECML-98, 10th European Conf. on Machine Learning. Springer, Heidelberg, Chemnitz, Germany, pp. 137–142.
Johnstone, 2000, Vocal communication of emotion, 220
Jolliffe, 2002
Kharat, 2008, Human emotion recognition system using optimally designed SVM with different facial feature extraction techniques, WSEAS Trans. Comput., 7
Kießling, A., 1997. Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitung. Berichte aus der Informatik. Shaker, Aachen, Germany.
Kim, S.-M., Hovy, E., 2005. Automatic detection of opinion bearing words and sentences. In: Companion Volume to the Proc. Internat. Joint Conf. on Natural Language Processing (IJCNLP), Jeju Island, Korea, pp. 61–66.
Kim, 2004, Emotion recognition system using short-term monitoring of physiological signals, Medical Biological Eng. Comput., 42, 419, 10.1007/BF02344719
Kim, J., André, E., Rehm, M., Vogt, T., Wagner, J., 2005. Integrating information from speech and physiological signals to achieve emotional sensitivity. In: Proc. Interspeech, Lisbon, Portugal, pp. 809–812.
Kim, E., Hyun, K., Kim, S., Kwak, Y., 2007. Emotion interactive robot focus on speaker independently emotion recognition. In: Proc. IEEE/ASME Internat. Conf. on Advanced Intelligent Mechatronics, Zurich, Switzerland, pp. 1–6.
Kockmann, M., Burget, L., Černocký, J., 2009. Brno University of Technology System for Interspeech 2009 Emotion Challenge. In: Proc. Interspeech, Brighton, pp. 348–351.
Kwon, O.-W., Chan, K., Hao, J., Lee, T.-W., 2003. Emotion recognition by speech signals. In: Proc. Interspeech, pp. 125–128.
Laskowski, K., 2009. Contrasting emotion-bearing laughter types in multiparticipant vocal activity detection for meetings. In: Proc. ICASSP, IEEE, Taipei, Taiwan, pp. 4765–4768.
Lee, 2005, Toward detecting emotions in spoken dialogs, IEEE Trans. Speech Audio Process., 13, 293, 10.1109/TSA.2004.838534
Lee, 1999, Learning the parts of objects by non-negative matrix factorization, Nature, 401, 788, 10.1038/44565
Lee, C.M., Narayanan, S.S., Pieraccini, R., 2002. Combining acoustic and language information for emotion recognition. In: Proc. Interspeech, Denver, CO, USA, pp. 873–376.
Lee, C., Mower, E., Busso, C., Lee, S., Narayanan, S., 2009. Emotion recognition using a hierarchical binary decision tree approach. In: Proc. Interspeech, Brighton, pp. 320–323.
Lefter, I., Wiggers, P., Rothkrantz, L., 2010. EmoReSp: an online emotion recognizer based on speech. In: Proc. 11th Internat. Conf. on Computer Systems and Technologies (CompSysTech), Sofia, Bulgaria, pp. 287–292.
Liscombe, J., Hirschberg, J., Venditti, J., 2005. Detecting certainness in spoken tutorial dialogues. In: Proc. INTERSPEECH, Lisbon, Portugal, pp. 1837–1840.
Liscombe, J., Riccardi, G., Hakkani-Tür, D., 2005b. Using context to improve emotion detection in spoken dialog systems. In: Proc. Interspeech, Lisbon, Portugal, pp. 1845–1848.
Liscombe, J., Venditti, J., Hirschberg, J., 2003. Classifying subject ratings of emotional speech using acoustic features. In: Proc. Eurospeech, Geneva, Switzerland, pp. 725–728.
Litman, D., Forbes, K., 2003. Recognizing emotions from student speech in tutoring dialogues. In: Proc. ASRU, Virgin Island, USA, pp. 25–30.
Liu, H., Liebermann, H., Selker, T., 2003. A model of textual affect sensing using real-world knowledge. In: Proc. 7th Internat. Conf. on Intelligent User Interfaces (IUI 2003), pp. 125–132.
Lizhong, 1999, Multimodal integration – a statistical view, IEEE Trans. Multimedia, 1, 334, 10.1109/6046.807953
Lovins, 1968, Development of a stemming algorithm, Mech. Transl. Comput. Linguist., 11, 22
Luengo, I., Navas, E., Hernáez, I., 2009. Combining spectral and prosodic information for emotion recognition in the interspeech 2009 emotion challenge. In: Proc. Interspeech, Brighton, pp. 332–335.
Lugger, M., Yang, B., 2008. Psychological motivated multi-stage emotion classification exploiting voice quality features. In: Mihelic, F., Zibert, J. (Eds.), Speech Recognition, IN-TECH, p. 1.
Lugger, M., Yang, B., Wokurek, W., 2006. Robust estimation of voice quality parameters under real world disturbances. In: Proc. ICASSP, Toulouse, France, pp. 1097–1100.
Makhoul, 1975, Linear prediction: a tutorial review, Proc. IEEE, 63, 561, 10.1109/PROC.1975.9792
Martin, 2006, Multimodal complex emotions: gesture expressivity and blended facial expressions, Int. J. Human. Robot., 3, 1, 10.1142/S0219843606000825
Martinez, C.A., Cruz, A., 2005. Emotion recognition in non-structured utterances for human–robot interaction. In: IEEE Internat. Workshop on Robot and Human Interactive Communication, Nashville, TN, USA, pp. 19–23.
Matos, 2006, Detection of cough signals in continuous audio recordings using hidden markov models, IEEE Trans. Biomed. Eng., 1078, 10.1109/TBME.2006.873548
McGilloway, S., Cowie, R., Doulas-Cowie, E., Gielen, S., Westerdijk, M., Stroeve, S., 2000. Approaching automatic recognition of emotion from voice: A rough benchmark. In: Proc. ISCA Workshop on Speech and Emotion, Newcastle, Northern Ireland, pp. 207–212.
Meyer, D., Leisch, F., Hornik, K., 2002. Benchmarking support vector machines. Report Series No. 78, Adaptive Informations Systems and Management in Economics and Management Science, 19 pages.
Missen, M., Boughanem, M., 2009. Using WordNet’s semantic relations for opinion detection in blogs. In: Advances in Information Retrieval, Lecture Notes in Computer Science, Vol. 5478/2009. Springer, pp. 729–733.
Morrison, 2007, Voting ensembles for spoken affect classification, J. Network Comput. Appl., 30, 1356, 10.1016/j.jnca.2006.09.005
Morrison, 2007, Ensemble methods for spoken emotion recognition in call-centres, Speech Comm., 49, 98, 10.1016/j.specom.2006.11.004
Morrison, 2007, Incremental learning for spoken affect classification and its application in call-centres, Int. J. Intell. Systems Technol. Appl., 2, 242
Mower, E., Metallinou, A., Lee, C.-C., Kazemzadeh, A., Busso, C., Lee, S., Narayanan, S., 2009. Interpreting ambiguous emotional expressions. In: Proc. ACII, Amsterdam, Netherlands, pp. 662–669.
Nasoz, 2004, Emotion recognition from physiological signals using wireless sensors for presence technologies, Cognition Technol. Work, 6, 4, 10.1007/s10111-003-0143-x
Nefian, A.V., Luhong, L., Xiaobo, P., Liu, X., Mao, C., Murphy, K., 2002. A coupled HMM for audio–visual speech recognition. In: Proc. ICASSP, Orlando, FL, USA, pp. 2013–2016.
Neiberg, D., Elenius, K., Laskowski, K., 2006. Emotion recognition in spontaneous speech using GMMs. In: Proc. Interspeech, Pittsburgh, PA, USA, pp. 809–812.
Nickerson, 2000, Null hypothesis significance testing: a review of an old and continuing controversy, Psychol. Methods, 5, 241, 10.1037/1082-989X.5.2.241
Nogueiras, A., Moreno, A., Bonafonte, A., Mariño, J.B., 2001. Speech emotion recognition using hidden markov models. In: Proc. Eurospeech, Aalborg, Denmark, pp. 2267–2270.
Noll, 1967, Cepstrum pitch determination, J. Acoust. Soc. Amer. (JASA), 14, 293, 10.1121/1.1910339
Nose, T., Kato, Y., Kobayashi, T., 2007. Style estimation of speech based on multiple regression hidden semi-markov model. In: Proc. Interspeech, Antwerp, Belgium, pp. 2285–2288.
Nöth, 2002, On the use of prosody in automatic dialogue understanding, Speech Comm., 36, 45, 10.1016/S0167-6393(01)00025-5
Nwe, 2003, Speech emotion recognition using hidden markov models, Speech Comm., 41, 603, 10.1016/S0167-6393(03)00099-2
Pachet, 2009, Analytical features: a knowledge-based approach to audio feature generation, EURASIP J. Audio Speech Music Process., 10.1155/2009/153017
Pal, P., Iyer, A., Yantorno, R., 2006. Emotion detection from infant facial expressions and cries. In: Proc. ICASSP, Toulouse, France, pp. 809–812.
Pang, B., Lee, L., Vaithyanathan, S., 2002. Thumbs up? sentiment classification using machine learning techniques. In: Proc. 2002 Conf. on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, PA, USA, pp. 79–86.
Pantic, 2003, Toward an affect-sensitive multimodal human–computer interaction, Proc. IEEE, 91, 1370, 10.1109/JPROC.2003.817122
Pernegger, 1998, What’s wrong with Bonferroni adjustment, Brit. Med. J., 316, 1236, 10.1136/bmj.316.7139.1236
Petrushin, V., 1999. Emotion in speech: recognition and application to call centers. In: Proc. Artificial Neural Networks in Engineering (ANNIE ’99), St. Louis, MO, USA, pp. 7–10.
Picard, 2001, Toward machine emotional intelligence: Analysis of affective physiological state, IEEE Trans. Pattern Anal. Machine Intell., 23, 1175, 10.1109/34.954607
Planet, S., Iriondo, I., Socoró, J.-C., Monzo, C., Adell, J., 2009. GTM-URL contribution to the INTERSPEECH 2009 Emotion Challenge. In: Proc. Interspeech, Brighton, pp. 316–319.
Polzehl, T., Sundaram, S., Ketabdar, H., Wagner, M., Metze, F., 2009. Emotion classification in children’s speech using fusion of acoustic and linguistic features. In: Proc. Interspeech, Brighton, pp. 340–343.
Polzin, T.S., Waibel, A., 2000. Emotion-sensitive human–computer interfaces. In: Proc. ISCA Workshop on Speech and Emotion, Newcastle, Northern Ireland, pp. 201–206.
Popescu, A.-M., Etzioni, O., 2005. Extracting product features and opinions from reviews. In: Proc. Human Language Technology Conf. and the Conf. on Empirical Methods in Natural Language Processing (HLT/EMNLP), Vancouver, British Columbia, pp. 339–346.
Porter, 1980, An algorithm for suffix stripping, Program, 14, 130, 10.1108/eb046814
Pudil, 1994, Floating search methods in feature selection, Pattern Recognition Lett., 15, 1119, 10.1016/0167-8655(94)90127-9
Rabiner, 1977, On the use of autocorrelation analysis for pitch detection, IEEE Trans. Acoust. Speech Signal Process., 25, 24, 10.1109/TASSP.1977.1162905
Rahurkar, M.A., Hansen, J.H.L., 2003. Towards affect recognition: an ICA approach. In: Proc. 4th Internat. Symp. on Independent Component Analysis and Blind Signal Separation (ICA2003), Nara, Japan, pp. 1017–1022.
Rong, J., Chen, Y.-P.P., Chowdhury, M., Li, G., 2007. Acoustic features extraction for emotion recognition. In: Proc. ACIS Internat. Conf. on Computer and Information Science. IEEE Computer Society, Los Alamitos, CA, pp. 419–424.
Rosch, 1975, Cognitive representations of semantic categories, J. Exp. Psychol.: Gen., 104, 192, 10.1037/0096-3445.104.3.192
Rosenberg, A., Binkowski, E., 2004. Augmenting the kappa statistic to determine interannotator reliability for multiply labeled data points. In: Dumais, D.M., Roukos, S. (Eds.), HLT-NAACL 2004: Short Papers. Association for Computational Linguistics, Boston, MA, USA, pp. 77–80.
Rozeboom, 1960, The fallacy of the null-hypothesis significance test, Psychol. Bull., 57, 416, 10.1037/h0042040
Russell, 2003, Facial and vocal expressions of emotion, Annu. Rev. Psychol., 329, 10.1146/annurev.psych.54.101601.145102
Sachs, 1967, Recognition memory for syntactic and semantic aspects od connected discourse, Percept. Psychophys., 2, 437, 10.3758/BF03208784
Said, 2010, Graded representations of emotional expressions in the left superior temporal sulcus, Front. Systems Neurosci., 4, 6, 10.3389/fnsys.2010.00006
Salzberg, 1997, On comparing classifiers: pitfalls to avoid and a recommended approach, Data Mining Knowl. Discov., 1, 317, 10.1023/A:1009752403260
Sato, 2007, Emotion recognition using mel-frequency cepstral coefficients, Inform. Media Technol., 2, 835
Scherer, 2003, Vocal expression of emotion, 433
Schiel, F., 1999. Automatic phonetic transcription of non-prompted speech. In: Proc. ICPhS, San Francisco, CA, USA, pp. 607–610.
Schröder, M., Pirker, H., Lamolle, M., 2006. First suggestions for an emotion annotation and representation language. In: Devillers, L., Martin, J.-C., Cowie, R., Douglas-Cowie, E., Batliner, A. (Eds.), Proc. Satellite Workshop of LREC 2006 on Corpora for Research on Emotion and Affect, Genoa, Italy, pp. 88–92.
Schröder, 2007, What should a generic emotion markup language be able to represent?, 440
Schröder, M., Cowie, R., Heylen, D., Pantic, M., Pelachaud, C., Schuller, B., 2008. Towards responsive sensitive artificial listeners. In: Proc. 4th Internat. Workshop on Human–Computer Conversation, Bellagio, Italy, no pagination.
Schuller, B., Rigoll, G., Lang, M., 2003. Hidden markov model-based speech emotion recognition. In: Proc. ICASSP, Hong Kong, pp. 1–4.
Schuller, B., Rigoll, G., Lang, M., 2004. Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: Proc. ICASSP, Montreal, Canada, pp. 577–580.
Schuller, B., Jiménez Villar, R., Rigoll, G., Lang, M., 2005a. Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition. In: Proc. ICASSP, Philadelphia, PA, USA, pp. I:325–328.
Schuller, B., Müller, R., Lang, M., Rigoll, G., 2005b. Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensemble. In: Proc. Interspeech, Lisbon, Portugal, pp. 805–808.
Schuller, B., Reiter, S., Mller, R., Al-Hames, M., Lang, M., Rigoll, G., 2005c. Speaker independent speech emotion recognition by ensemble classification. In: Proc. ICME, Amsterdam, Netherlands, pp. 864–867.
Schuller, B., Arsić, D., Wallhoff, F., Rigoll, G., 2006. Emotion recognition in the noise applying large acoustic feature sets. In: Proc. Speech Prosody 2006, Dresden, Germany, no pagination.
Schuller, B., Rigoll, G., 2006. Timing levels in segment-based speech emotion recognition. In: Proc. Interspeech, Pittsburgh, PA, USA, pp. 1818–1821.
Schuller, B., Köhler, N., Müller, R., Rigoll, G., 2006b. Recognition of interest in human conversational speech. In: Proc. Interspeech, Pittsburgh, PA, USA, pp. 793–796.
Schuller, B., Reiter, S., Rigoll, G., 2006c. Evolutionary feature generation in speech emotion recognition. In: Proc. Internat. Conf. on Multimedia and Expo ICME 2006, Toronto, Canada, pp. 5–8.
Schuller, B., Stadermann, J., Rigoll, G., 2006d. Affect-robust speech recognition by dynamic emotional adaptation. In: Proc. Speech Prosody 2006, Dresden, Germany, no pagination.
Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V., 2007a. The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: Proc. Interspeech, Antwerp, Belgium, pp. 2253–2256.
Schuller, B., Müller, R., Hörnler, B., Höthker, A., Konosu, H., Rigoll, G., 2007b. Audiovisual recognition of spontaneous interest within conversations. In: Proc. 9th Internat. Conf. on Multimodal Interfaces (ICMI), Special Session on Multimodal Analysis of Human Spontaneous Behaviour. ACM SIGCHI, Nagoya, Japan, pp. 30–37.
Schuller, B., Seppi, D., Batliner, A., Meier, A., Steidl, S., 2007c. Towards more reality in the recognition of emotional speech. In: Proc. ICASSP, Honolulu, HY, USA, pp. 941–944.
Schuller, B., Batliner, A., Steidl, S., Seppi, D., 2008a. Does affect automatic recognition of children’s speech? In: Proc. 1st Workshop on Child, Computer and Interaction, Chania, Greece, 4 pages, no pagination.
Schuller, B., Eyben, F., Rigoll, G., 2008b. Static and dynamic modelling for the recognition of non-verbal vocalisations in conversational speech. In: André, E. (Ed.), Proc. 4th IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-based Systems (PIT 2008), Kloster Irsee, Germany, LNCS, Vol. 5078. Springer, pp. 99–110.
Schuller, B., Vlasenko, B., Arsic, D., Rigoll, G., Wendemuth, A., 2008c. Combining speech recognition and acoustic word emotion models for robust text-independent emotion recognition. In: Proc. ICME, Hannover, Germany, pp. 1333–1336.
Schuller, B., Wimmer, M., Arsic, D., Moosmayr, T., Rigoll, G., 2008d. Detection of security related affect and behaviour in passenger transport. In: Proc. Interspeech, Brisbane, Australia, pp. 265–268.
Schuller, B., Wimmer, M., Mösenlechner, L., Kern, C., Arsic, D., Rigoll, G., 2008e. Brute-forcing hierarchical functionals for paralinguistics: a waste of feature space? In: Proc. ICASSP, Las Vegas, NV, pp. 4501–4504.
Schuller, B., Rigoll, G., 2009. Recognising interest in conversational speech – comparing bag of frames and supra-segmental features. In: Proc. Interspeech, Brighton, UK, pp. 1999–2002.
Schuller, B., Batliner, A., Steidl, S., Seppi, D., 2009a. Emotion recognition from speech: putting ASR in the loop. In: Proc. ICASSP, IEEE, Taipei, Taiwan, pp. 4585–4588.
Schuller, B., Müller, R., Eyben, F., Gast, J., Hörnler, B., Wöllmer, M., Rigoll, G., Höthker, A., Konosu, H., 2009b. Being bored? Recognising natural interest by extensive audiovisual integration for real-life Application. Image Vision Comput. J. (IMAVIS) 27, 1760–1774 (Special Issue on Visual and Multimodal Analysis of Human Spontaneous Behavior).
Schuller, B., Schenk, J., Rigoll, G., Knaup, T., 2009c. The “Godfather” vs. “Chaos”: comparing linguistic analysis based on online knowledge sources and Bags-of-N-grams for movie review valence estimation. In: Proc. Internat. Conf. on Document Analysis and Recognition, Barcelona, Spain, pp. 858–862.
Schuller, B., Steidl, S., Batliner, A., 2009d. The INTERSPEECH 2009 Emotion Challenge. In: Proc. Interspeech, Brighton, UK, pp. 312–315.
Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., Wendemuth, A., 2009e. Acoustic emotion recognition: a benchmark comparison of performances. In: Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Merano, pp. 552–557.
Schuller, B., Wöllmer, M., Eyben, F., Rigoll, G., 2009f. Spectral or voice quality? Feature type relevance for the discrimination of emotion pairs. In: Hancil, S. (Ed.), The Role of Prosody in Affective Speech. Linguistic Insights, Studies in Language and Communication, Vol. 97. Peter Lang, pp. 285–307.
Schuller, B., Wöllmer, M., Moosmayr, T., Rigoll, G., 2009g. Recognition of noisy speech: a comparative survey of robust model architectures and feature enhancement. EURASIP J. Audio Speech Music Process. (JASMP), 17 pages, Article ID 942617.
Schuller, B., Burkhardt, F., 2010. Learning with synthesized speech for automatic emotion recognition. In: Proc. 35th IEEE Internat. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Dallas, pp. 5150–5153.
Schuller, B., Weninger, F., 2010. Discrimination of speech and non-linguistic vocalizations by non-negative matrix factorization. In: Proc. ICASSP, Dallas, pp. 5054–5057.
Schuller, B., Eyben, F., Can, S., Feussner, H., 2010a. Speech in minimal invasive surgery – towards an affective language resource of real-life medical operations. In: Proc. 3rd ELRA Internat. Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect, Valetta, pp. 5–9.
Schuller, B., Metze, F., Steidl, S., Batliner, A., Eyben, F., Polzehl, T., 2010b. Late fusion of individual engines for improved recognition of negative emotions in speech – learning vs. democratic vote. In: Proc. 35th IEEE Internat. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Dallas, pp. 5230–5233.
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C., Narayanan, S., 2010c. The INTERSPEECH 2010 paralinguistic challenge. In: Proc. INTERSPEECH 2010, Makuhari, Japan, pp. 2794–2797.
Seppi, D., Batliner, A., Schuller, B., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Aharonson, V., 2008a. Patterns, prototypes, performance: classifying emotional user states. In: Proc. Interspeech, Brisbane, Australia, pp. 601–604.
Seppi, D., Gerosa, M., Schuller, B., Batliner, A., Steidl, S., 2008b. Detecting problems in spoken child–computer interaction. In: Proc. 1st Workshop on Child, Computer and Interaction, Chania, Greece, no pagination.
Seppi, D., Batliner, A., Steidl, S., Schuller, B., Nöth, E., 2010. Word accent and emotion. In: Proc. Speech Prosody 2010, Chicago, IL, no pagination.
Sethu, V., Ambikairajah, E., Epps, J., 2007. Speaker normalisation for speech-based emotion detection. In: Proc. 15th Internat. Conf. on Digital Signal Processing, Cardiff, pp. 611–614.
Shami, M., Verhelst, W., 2007. Automatic classification of expressiveness in speech: a multi-corpus study. In: Müller, C. (Ed.), Speaker Classification II, Lecture Notes in Computer Science/Artificial Intelligence, Vol. 4441. Springer, Heidelberg–Berlin–New York, pp. 43–56.
Shaver, 1992, Cross-cultural similarities and differences in emotion and its representation: a prototype approach, Emotion, 175
Sood, S., Krishnamurthy, A., 2004. A robust on-the-fly pitch (otfp) estimation algorithm. In: Proc. 12th Annual ACM Internat. Conf. on Multimedia (MULTIMEDIA ’04). ACM, New York, NY, USA, pp. 280–283.
Steidl, S., Ruff, C., Batliner, A., Nöth, E., Haas, J., 2004. Looking at the last two turns, I’d say this dialogue is doomed — measuring dialogue success. In: Sojka, P., Kopeček, I., Pala, K. (Eds.), 7th Internat. Conf. on Text, Speech and Dialogue, TSD 2004, Berlin, Heidelberg, pp. 629–636.
Steidl, S., Batliner, A., Nöth, E., Hornegger, A., 2008. Quantification of segmentation and fo errors and their effect on emotion recognition. In: 11th Internat. Conf. on Text, Speech and Dialogue, TSD 2008, pp. 525–534.
Steidl, S., 2009. Automatic classification of emotion-related user states in spontaneous children’s speech. Logos Verlag, Berlin, Germany, (Ph.D. Thesis, FAU Erlangen-Nuremberg).
Steidl, S., Schuller, B., Batliner, A., Seppi, D., 2009. The hinterland of emotions: facing the open-microphone challenge. In: Proc. ACII, Amsterdam, Netherlands, pp. 690–697.
Steidl, S., Batliner, A., Seppi, D., Schuller, B., 2010. On the impact of children’s emotional speech on acoustic and language models. EURASIP J. Audio Speech Music Process., 14. doi:10.1155/2010/783954.
Takahashi, K., 2004. Remarks on emotion recognition from bio-potential signals. In: Proc. 2nd Internat. Conf. on Autonomous Robots and Agents, pp. 186–191.
tenBosch, 2003, Emotions, speech and the ASR framework, Speech Comm., 40, 213, 10.1016/S0167-6393(02)00083-3
Tomlinson, M.J., Russell, M.J., Brooke, N.M., 1996. Integrating audio and visual information to provide highly robust speech recognition. In: Proc. ICASSP, Atlanta, GA, USA, pp. 812–824.
Truong, K., van Leeuwen, D., 2005. Automatic detection of laughter. In: Proc. Interspeech, Lisbon, Portugal, pp. 485–488.
Ververidis, D., Kotropoulos, C., 2003. A review of emotional speech databases. In: PCI 2003, 9th Panhellenic Conf. on Informatics, November 1–23, 2003, Thessaloniki, Greece, pp. 560–574.
Ververidis, D., Kotropoulos, C., 2006. Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collection. In: Proc. European Signal Processing Conf. (EUSIPCO 2006), Florence, Italy, no pagination.
Vidrascu, L., Devillers, L., 2007. Five emotion classes in real-world call center data: the use of various types of paralinguistic features. In: Proc. PARALING07, pp. 11–16.
Vinciarelli, A., Pantic, M., Bourlard, H., Pentland, A., 2008. Social signals, their function, and automatic analysis: a survey. In: Proc. 10th Internat. Conf. on Multimodal Interfaces, ACM, New York, USA, pp. 61–68.
Vlasenko, B., 2009. Processing affected speech within human machine interaction. In: Proc. Interspeech, Brighton, pp. 2039–2042.
Vlasenko, B., Schuller, B., Wendemuth, A., Rigoll, G., 2007a. Combining frame and turn-level information for robust recognition of emotions within speech. In: Proc. Interspeech, Antwerp, Belgium, pp. 2249–2252.
Vlasenko, 2007, Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing, 139
Vlasenko, B., Schuller, B., Mengistu, T.K., Rigoll, G.A.W., 2008. Balancing spoken content adaptation and unit length in the recognition of emotion and interest. In: Proc. Interspeech, Brisbane, Australia, pp. 805–808.
Vogt, T., André, E., 2005. Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: Proc. Multimedia and Expo (ICME05), Amsterdam, Netherlands, pp. 474–477.
Vogt, T., André, E., 2009. Exploring the benefits of discretization of acoustic features for speech emotion recognition. In: Proc. Interspeech, Brighton, pp. 328–331.
Vogt, T., André, E., Bee, N., 2008. Emovoice – a framework for online recognition of emotions from voice. In: Proc. IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems (PIT 2008). Lecture Notes in Computer Science, Vol. 5078. Springer, Kloster Irsee, Germany, pp. 188–199.
Vogt, T., André, E., Wagner, J., Gilroy, S., Charles, F., Cavazza, M., 2009. Real-time vocal emotion recognition in artistic installations and interactive storytelling: Experiences and lessons learnt from CALLAS and IRIS. In: Proc. ACII, Amsterdam, Netherlands, pp. 670–677.
Wagner, J., Kim, J., André, E., 2005. From physiological signals to emotions: implementing and comparing selected methods for feature extraction and classification. In: Proc. ICME, Amsterdam, Netherlands, pp. 940–943.
Wagner, J., Vogt, T., André, 2007. A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech. In: Paiva, A., Prada, R., Picard, R.W. (Eds.), Affective Computing and Intelligent Interaction. Springer, Berlin–Heidelberg, pp. 114–125.
Wang, Y., Guan, L., 2005. Recognizing human emotion from audiovisual information. In: Proc. ICASSP, Vol. 2. Philadelphia, PA, USA, pp. 1125–1128.
Wilson, T., Wiebe, J., Hwa, R., 2004. Just how mad are you? Finding strong and weak opinion clauses. In: Proc. Conf. American Association for Artificial Intelligence (AAAI), San Jose, CA, no pagination.
Wimmer, M., Schuller, B., Arsic, D., Radig, B., Rigoll, G., 2008. Low-level fusion of audio and video features for multi-modal emotion recognition. In: Proc. 3rd Internat. Conf. on Computer Vision Theory and Applications, Funchal, Portugal, pp. 145–151.
Witten, 2005
Wöllmer, M., Eyben, F., Reiter, S., Schuller, B., Cox, C., Douglas-Cowie, E., Cowie, R., 2008. Abandoning emotion classes – towards continuous emotion recognition with modelling of long-range dependencies. In: Proc. Interspeech, Brisbane, Australia, pp. 597–600.
Wöllmer, 2009, A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams, Neurocomputing, 73, 366, 10.1016/j.neucom.2009.08.005
Wöllmer, M., Eyben, F., Keshet, J., Graves, A., Schuller, B., Rigoll, G., 2009. Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks. In: Proc. ICASSP, Taipei, Taiwan, pp. 3949–3952.
Wöllmer, M., Schuller, B., Eyben, F., Rigoll, G., 2010. Combining long short-term memory and dynamic bayesian networks for incremental emotion-sensitive artificial listening. IEEE J. Select. Topics Signal Process. 4, 867–881 (Special Issue on “Speech Processing for Natural Interaction with Intelligent Environments”).
Wolpert, 1992, Stacked generalization, Neural Networks, 5, 241, 10.1016/S0893-6080(05)80023-1
Wu, 2005, Posting act tagging using transformation-based learning, 319
Wu, C.-H., Yeh, J.-F., Chuang, Z.-J., 2008a. Emotion perception and recognition from speech. In: Affective Information Processing. II. Springer, London, pp. 93–110.
Wu, S., Falk, T., Chan, W.-Y., 2008b. Long-term spectro-temporal information for improved automatic speech emotion classification. In: Proc. INTERSPEECH, Brisbane, Australia, pp. 638–641.
Yi, J., Nasukawa, T., Bunescu, R., Niblack, W., 2003. Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques. In: Proc. IEEE Internat. Conf. on Data Mining (ICDM), Melbourne, FL, pp. 427–434.
You, M., Chen, C., Bu, J., Liu, J., Tao, J., 2006. Emotion recognition from noisy speech. In: Proc. ICME, Toronto, Canada, pp. 1653–1656.
Young, S., Evermann, G., Gales, M., Hain, T., D.Kershaw, Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P., 2006. The HTK Book, for htk version 3.4 Edition, Cambridge University Engineering Department.
Yu, C., Aoki, P., Woodruf, A., 2004. Detecting user engagement in everyday conversations. In: Proc. ICSLP, pp. 1329–1332.
Zeng, 2007, Audio–visual spontaneous emotion recognition, Artif. Intell. Human Comput., 72, 10.1007/978-3-540-72348-6_4
Zeng, 2007, Audio–visual affect recognition, IEEE Trans. Multimedia, 9, 424, 10.1109/TMM.2006.886310
Zeng, 2009, A survey of affect recognition methods: audio, visual, and spontaneous expressions, IEEE Trans. Pattern Anal. Machine Intell., 31, 39, 10.1109/TPAMI.2008.52
Zhe, X., Boucouvalas, A., 2002. Text-to-emotion engine for real time internet communication. In: Proc. Internat. Symp. on Communication Systems, Networks, and DSPs, Staffordshire University, pp. 164–168.
Zwicker, 1999