A review of affective computing: From unimodal analysis to multimodal fusion

Information Fusion - Tập 37 - Trang 98-125 - 2017
Soujanya Poria1, Erik Cambria2, Rajiv Bajpai3, Amir Hussain1
1School of Natural Sciences, University of Stirling, UK
2School of Computer Science and Engineering, Nanyang Technological University, Singapore
3Temasek Laboratories, Nanyang Technological University, Singapore

Tóm tắt

Từ khóa


Tài liệu tham khảo

Balazs, 2016, Opinion mining and information fusion: a survey, Inf. Fusion, 27

Sun, 2017, A review of natural language processing techniques for opinion mining systems, Inf. Fusion, 36

Cambria, 2014, Guest editorial: big social data analysis, Knowl.-Based Syst., 69, 1, 10.1016/j.knosys.2014.07.002

Rosas, 2013, Multimodal sentiment analysis of spanish online videos, IEEE Intell. Syst., 28, 38, 10.1109/MIS.2013.9

Qi, 2001, Multisensor data fusion in distributed sensor networks using mobile agents, 11

Morency, 2011, Towards multimodal sentiment analysis: harvesting opinions from the web, 169

Shimojo, 2001, Sensory modalities are not separate modalities: plasticity and interactions, Curr.Opin.Neurobiol., 11, 505, 10.1016/S0959-4388(00)00241-5

D’mello, 2015, A review and meta-analysis of multimodal affect detection systems, ACM Comput. Surv., 47, 43

Zeng, 2009, A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions, IEEE Trans. pattern Anal.Mach.Intell., 31, 39, 10.1109/TPAMI.2008.52

Darwin, 1872

Ekman, 1970, Universal facial expressions of emotion, California Mental Health Res. Digest, 8, 151

Parrott, 2001

Dalgleish, 1999

Prinz, 2004

Russell, 2003, Core affect and the psychological construction of emotion., Psychol.Rev., 110, 145, 10.1037/0033-295X.110.1.145

Osgood, 1952, The nature and measurement of meaning., Psychol.Bull., 49, 197, 10.1037/h0055737

Russell, 1979, Affective space is bipolar., J.Personality Social Psychol., 37, 345, 10.1037/0022-3514.37.3.345

Whissell, 1989, The dictionary of affect in language, Emotion, 4, 94

Plutchik, 1980

Freitas, 2009, Facial expression: the effect of the smile in the treatment of depression. empirical study with Portuguese subjects, Emotional Expression, 127

Mehrabian, 1996, Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament, Curr. Psychol., 14, 261, 10.1007/BF02686918

Fontaine, 2007, The world of emotions is not two-dimensional, Psychol.Sci., 18, 1050, 10.1111/j.1467-9280.2007.02024.x

Cochrane, 2009, Eight dimensions for the emotions, Social Sci. Inf., 48, 379, 10.1177/0539018409106198

Cambria, 2012, The hourglass of emotions, 7403, 144

Cambria, 2015, AffectiveSpace 2: enabling affective intuition for concept-level sentiment analysis, 508

Pérez-Rosas, 2013, Utterance-level multimodal sentiment analysis., 973

Wollmer, 2013, Youtube movie reviews: Sentiment analysis in an audio-visual context, Intell. Syst. IEEE, 28, 46, 10.1109/MIS.2013.34

Douglas-Cowie, 2007, The humaine database: addressing the collection and annotation of naturalistic and induced emotional data, 488

Douglas-Cowie, 2000, A new emotion database: considerations, sources and scope, 39

McKeown, 2012, The semaine database: annotated multimodal records of emotionally colored conversations between a person and a limited agent, Affective Comput. IEEE Trans.., 3, 5, 10.1109/T-AFFC.2011.20

Douglas-Cowie, 2008, The sensitive artificial listner: an induction technique for generating emotionally coloured conversation, LREC Workshop Corpora Res. Emot. Affect

Busso, 2008, Iemocap: interactive emotional dyadic motion capture database, Lang.Resour.Eval., 42, 335, 10.1007/s10579-008-9076-6

Martin, 2006, The enterface’05 audio-visual emotion database

E. Paul, W. Friesen, Facial action coding system investigator’s guide, 1978.

Izard, 1983

Kring, 1991, The facial expression coding system (faces): a users guide, Unpublished manuscript

Ekman, 2002, Facs investigator’s guide, A Human Face

P. Ekman, E. Rosenberg, J. Hager, Facial action coding system affect interpretation dictionary (facsaid), 1998.

Ekman, 1997

Matsumoto, 1992, More evidence for the universality of a contempt expression, Motivation Emotion, 16, 363, 10.1007/BF00992972

Rinn, 1984, The neuropsychology of facial expression: a review of the neurological and psychological mechanisms for producing facial expressions., Psychol.Bull., 95, 52, 10.1037/0033-2909.95.1.52

Bartlett, 1999, Measuring facial expressions by computer image analysis, Psychophysiology, 36, 253, 10.1017/S0048577299971664

Breidt, 2003, Facial animation based on 3d scans and motion capture, Siggraph’03 Sketches and Applications

Parke, 2008

Tao, 1999, Compression of mpeg-4 facial animation parameters for transmission of talking heads, Circuits Syst. Video Technol. IEEE Trans., 9, 264, 10.1109/76.752094

Yacoob, 1994, Computing spatio-temporal representations of human faces, 70

Black, 1995, Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion, 374

Zhang, 1999, Feature-based facial expression recognition: Sensitivity analysis and experiments with a multilayer perceptron, Int.J.Pattern Recognit.Artif. Intell., 13, 893, 10.1142/S0218001499000495

Haro, 2000, Detecting and tracking eyes by using their physiological properties, dynamics, and appearance, 1, 163

Jones, 1998, Multidimensional morphable models, 683

Cootes, 2001, Active appearance models, IEEE Trans.Pattern Anal.Mach.Intell., 23, 681, 10.1109/34.927467

Donato, 1999, Classifying facial actions, Pattern Anal. Mach. Intell. IEEE Trans., 21, 974, 10.1109/34.799905

Tian, 2001, Recognizing action units for facial expression analysis, Pattern Anal. Mach. Intell. IEEE Trans., 23, 97, 10.1109/34.908962

Fasel, 2000, Recognition of asymmetric facial action unit activities and intensities, 1, 1100

Lyons, 1999, Automatic classification of single facial images, IEEE Trans. Pattern Anal. Mach. Intell., 21, 1357, 10.1109/34.817413

Littlewort, 2006, Dynamics of facial expression extracted automatically from video, Image Vision Comput., 24, 615, 10.1016/j.imavis.2005.09.011

Cohen, 2003, Facial expression recognition from video sequences: temporal and static modeling, Comput. Vision Image Underst., 91, 160, 10.1016/S1077-3142(03)00081-X

Wang, 2004, Real time facial expression recognition with adaboost, 3, 926

Lanitis, 1995, Automatic face identification system using flexible appearance models, Image Vision Comput., 13, 393, 10.1016/0262-8856(95)99726-H

Cootes, 1995, Active shape models-their training and application, Comput.Vision Image Underst., 61, 38, 10.1006/cviu.1995.1004

Blanz, 1999, A morphable model for the synthesis of 3d faces, 187

Ohta, 1998, Recognition of facial expressions using muscle-based feature models, 2, 1379

Cohen, 2003, Learning bayesian network classifiers for facial expression recognition both labeled and unlabeled data, 1, I

Kimura, 1997, Facial expression recognition and its degree estimation, 295

Verma, 2005, Quantification of facial expressions using high-dimensional shape transformations, J.Neurosci.Methods, 141, 61, 10.1016/j.jneumeth.2004.05.016

Baltrušaitis, 2012, 3d constrained local model for rigid and non-rigid facial tracking, 2610

Morency, 2008, Generalized adaptive view-based appearance model: Integrated framework for monocular head pose estimation, 1

Yeasin, 2004, From facial expression to level of interest: a spatio-temporal approach, 2, II

Lien, 2000, Detection, tracking, and classification of action units in facial expression, Robo. Auton. Syst., 31, 131, 10.1016/S0921-8890(99)00103-7

Chang, 2004, Probabilistic expression analysis on manifolds, 2, II

Kring, 2007, The facial expression coding system (faces): development, validation, and utility., Psychol.Assess., 19, 210, 10.1037/1040-3590.19.2.210

Davatzikos, 2001, Measuring biological shape using geometry-based shape transformations, Image Vision Comput., 19, 63, 10.1016/S0262-8856(00)00056-1

Wen, 2003, Capturing subtle facial motions in 3d face tracking, 1343

Pantic, 2000, Expert system for automatic analysis of facial expressions, Image Vision Comput., 18, 881, 10.1016/S0262-8856(00)00034-2

Pantic, 2000, Automatic analysis of facial expressions: The state of the art, Pattern Anal. Mach. Intell. IEEE Trans., 22, 1424, 10.1109/34.895976

Fasel, 2003, Automatic facial expression analysis: a survey, Pattern Recognit., 36, 259, 10.1016/S0031-3203(02)00052-3

De Meijer, 1989, The contribution of general features of body movement to the attribution of emotions, J. Nonverbal Behav., 13, 247, 10.1007/BF00990296

Kapur, 2005, Gesture-based affective computing on motion capture data, 1

Piana, 2013, A set of full-body movement features for emotion recognition to help children affected by autism spectrum condition

Piana, 2014, Real-time automatic emotion recognition from body gestures, arXiv preprint arXiv:1402.5047

Caridakis, 2007, Multimodal emotion recognition from expressive faces, body gestures and speech, 375

Balomenos, 2004, Emotion analysis in man-machine interaction systems, 318

Hinton, 2006, A fast learning algorithm for deep belief nets, Neural Comput., 18, 1527, 10.1162/neco.2006.18.7.1527

Krizhevsky, 2012, Imagenet classification with deep convolutional neural networks, 1097

LeCun, 2010, Convolutional networks and applications in vision., 253

Kavukcuoglu, 2010, Learning convolutional feature hierarchies for visual recognition, 1090

Hamel, 2010, Learning features from music audio with deep belief networks., 339

Chaturvedi, 2016, Learning word dependencies in text by means of a deep recurrent belief network, Knowl.-Based Syst., 108, 144, 10.1016/j.knosys.2016.07.019

Hinton, 2010, A practical guide to training restricted boltzmann machines, Momentum, 9, 926

Xu, 2014, Visual sentiment prediction with deep convolutional neural networks, arXiv preprint arXiv:1411.5731

You, 2015, Robust image sentiment analysis using progressively trained and domain transferred deep networks, arXiv preprint arXiv:1509.06041

Xu, 2015, Heterogeneous knowledge transfer in video emotion recognition, attribution and summarization, arXiv preprint arXiv:1511.04798

Tran, 2014, Learning spatiotemporal features with 3d convolutional networks, arXiv preprint arXiv:1412.0767

Poria, 2016, Convolutional MKL based multimodal emotion recognition and sentiment analysis

Wu, 2009, Emotion perception and recognition from speech, 93

Morrison, 2007, Ensemble methods for spoken emotion recognition in call-centres, Speech Commun., 49, 98, 10.1016/j.specom.2006.11.004

Wu, 2011, Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels, IEEE Trans. Affective Comput., 2, 10, 10.1109/T-AFFC.2010.16

Murray, 1993, Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion, J. Acoust. Soc. Am., 93, 1097, 10.1121/1.405558

Luengo, 2005, Automatic emotion recognition using prosodic parameters., 493

Koolagudi, 2011, Speech emotion recognition using segmental level prosodic analysis, 1

Västfjäll, 2002, Emotion in product sound design, Proceedings of Journées Design Sonore

Batliner, 2003, How to find trouble in communication, Speech Commun., 40, 117, 10.1016/S0167-6393(02)00079-1

Lee, 2005, Toward detecting emotions in spoken dialogs, IEEE Trans.Speech Audio Process., 13, 293, 10.1109/TSA.2004.838534

Hirschberg, 2005, Distinguishing deceptive from non-deceptive speech., 1833

Devillers, 2005, Challenges in real-life emotion annotation and machine learning based detection, Neural Netw., 18, 407, 10.1016/j.neunet.2005.03.007

Vogt, 2005, Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition, 474

Eyben, 2009, Openear—introducing the munich open-source emotion and affect recognition toolkit, 1

Levenson, 1994, Human emotion: a functional view, Nat.Emotion, 1, 123

Datcu, 2008, Semantic audio-visual data fusion for automatic emotion recognition, Euromedia’2008

Dellaert, 1996, Recognizing emotion in speech, 3, 1970

Johnstone, 1996, Emotional speech elicited using computer games, 3, 1985

Chen, 2000

Navas, 2006, An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional tts, Audio Speech Lang. Process. IEEE Transac., 14, 1117, 10.1109/TASL.2006.876121

El Ayadi, 2011, Survey on speech emotion recognition: features, classification schemes, and databases, Pattern Recognit., 44, 572, 10.1016/j.patcog.2010.09.020

Atassi, 2008, A speaker independent approach to the classification of emotional vocal expressions, 2, 147

Burkhardt, 2005, A database of german emotional speech., 5, 1517

Pudil, 1994, Floating search methods for feature selection with nonmonotonic criterion functions, 2, 279

Scherer, 1996, Adding the affective dimension: a new look in speech analysis and synthesis.

Huang, 2014, Speech emotion recognition using cnn, 801

Graves, 2005, Bidirectional lstm networks for improved phoneme classification and recognition, 799

Eyben, 2010, On-line emotion recognition in a 3-d activation-valence-time continuum using acoustic and linguistic cues, J. Multimodal User Interfaces, 3, 7, 10.1007/s12193-009-0032-6

Anand, 2015, Convoluted feelings convolutional and recurrent nets for detecting emotion from audio data

Han, 2014, Speech emotion recognition using deep neural network and extreme learning machine., 223

Tajadura-Jiménez, 2008, Auditory-induced emotion: a neglected channel for communication in human-computer interaction, 63

Vogt, 2008, Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation, 75

Strapparava, 2004, Wordnet affect: an affective extension of wordnet., 4, 1083

Alm, 2005, Emotions from text: machine learning for text-based emotion prediction, 579

Mishne, 2005, Experiments with mood classification in blog posts, 19, 321

Oneto, 2016, Statistical learning theory and ELM for big social data analysis, IEEE Comput. Intell. Mag., 11, 45, 10.1109/MCI.2016.2572540

Yang, 2007, Building emotion lexicon from weblog corpora, 133

Chaumartin, 2007, Upar7: a knowledge-based system for headline sentiment tagging, 422

Esuli, 2006, Sentiwordnet: a publicly available lexical resource for opinion mining, 6, 417

Lin, 2007, What emotions do news articles trigger in their readers?, 733

Hu, 2004, Mining and summarizing customer reviews, 168

Cambria, 2016, Affective computing and sentiment analysis, IEEE Intell. Syst., 31, 102, 10.1109/MIS.2016.31

Pang, 2002, Thumbs up?: sentiment classification using machine learning techniques, 79

Socher, 2013, Recursive deep models for semantic compositionality over a sentiment treebank, 1631, 1642

Yu, 2003, Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences, 129

Melville, 2009, Sentiment analysis of blogs by combining lexical knowledge with text classification, 1275

Turney, 2002, Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews, 417

Hu, 2013, Unsupervised sentiment analysis with emotional signals, 607

Gangemi, 2014, Frame-based detection of opinion holders and topics: a model and a tool, Comput. Intell. Mag. IEEE, 9, 20, 10.1109/MCI.2013.2291688

Cambria, 2015

Kanayama, 2006, Fully automatic lexicon expansion for domain-oriented sentiment analysis, 355

Blitzer, 2007, Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification, 7, 440

Pan, 2010, Cross-domain sentiment classification via spectral feature alignment, 751

Bollegala, 2013, Cross-domain sentiment classification using a sentiment sensitive thesaurus, Knowl. Data Eng. IEEE Trans., 25, 1719, 10.1109/TKDE.2012.103

Cambria, 2016, SenticNet 4: a semantic resource for sentiment analysis based on conceptual primitives, 2666

Wu, 2011, Sentiment value propagation for an integral sentiment dictionary based on commonsense knowledge, 75

Chenlo, 2014, An empirical study of sentence features for subjectivity and polarity classification, Inf. Sci., 280, 275, 10.1016/j.ins.2014.05.009

Shah, 2016, Leveraging multimodal information for event summarization and concept-level sentiment analysis, Knowl.-Based Syst., 108, 102, 10.1016/j.knosys.2016.05.022

Gezici, 2013, Su-sentilab: a classification system for sentiment analysis in twitter, 471

Poria, 2016, A deeper look into sarcastic tweets using deep convolutional neural networks, 1601

Bravo-Marquez, 2014, Meta-level sentiment models for big social data analysis, Knowl.-Based Syst., 69, 86, 10.1016/j.knosys.2014.05.016

Jaiswal, 2016, The truth and nothing but the truth: multimodal analysis for deception detection

Xie, 2016, Incorporating sentiment into tag-based user profiles and resource profiles for personalized search in folksonomy, Inf. Process. Manage., 52, 61, 10.1016/j.ipm.2015.03.001

Scharl, 2016, Analyzing the public discourse on works of fiction – detection and visualization of emotion in online coverage about hbo’s game of thrones, Inf. Process. Manage., 52, 129, 10.1016/j.ipm.2015.02.003

Egger, 2017, Consumer-oriented tech mining: Integrating the consumer perspective into organizational technology intelligence – the case of autonomous driving, 10.24251/HICSS.2017.133

Poria, 2014, Sentic patterns: dependency-based rules for concept-level sentiment analysis, Knowl.-Based Syst., 69, 45, 10.1016/j.knosys.2014.05.005

Cambria, 2009, Common sense computing: from the society of mind to digital intuition and beyond, 5707, 252

Hatzivassiloglou, 1997, Predicting the semantic orientation of adjectives, 174

Jia, 2009, The effect of negation on sentiment analysis and retrieval effectiveness, 1827

Reyes, 2014, On the difficulty of automatically detecting irony: beyond a simple case of negation, Knowl. Inf. Syst., 40, 595, 10.1007/s10115-013-0652-8

Chawla, 2013, Iitb-sentiment-analysts: Participation in sentiment analysis in twitter semeval 2013 task, 2, 495

Polanyi, 2004, Sentential structure and discourse parsing, 80

Wolf, 2005, Representing discourse coherence: a corpus-based study, Comput. Ling., 31, 249, 10.1162/0891201054223977

Wellner, 2009, Classification of discourse coherence relations: an exploratory study using multiple knowledge sources, 117

Ramesh, 2010, Identifying discourse connectives in biomedical text, 2010, 657

Liu, 2012, Sentiment analysis and opinion mining, Synth. Lect. Human Lang. Technol., 5, 1, 10.2200/S00416ED1V01Y201204HLT016

Moilanen, 2007, Sentiment composition, 378

Ding, 2008, A holistic lexicon-based approach to opinion mining, 231

Poria, 2016, Aspect extraction for opinion mining with a deep convolutional neural network, Knowl.-Based Syst., 108, 42, 10.1016/j.knosys.2016.06.009

Ng, 1997, Feature selection, perceptron learning, and a usability case study for text categorization, 31, 67

Kim, 2000, Text filtering by boosting naive bayes classifiers, 168

Jordan, 2002, On discriminative vs. generative classifiers: Aa comparison of logistic regression and naive bayes, Adv.Neural Inf.Process.Syst., 14, 841

Li, 2006, Sentence similarity based on semantic nets and corpus statistics, Knowl. Data Eng. IEEE Trans., 18, 1138, 10.1109/TKDE.2006.130

Phan, 2008, Learning to classify short and sparse text & web with hidden topics from large-scale data collections, 91

Sahlgren, 2004, Using bag-of-concepts to improve the performance of support vector machines in text categorization, 487

Wang, 2014, Concept-based short text classification and ranking, 1069

Zhang, 2007, Semantic text classification of emergent disease reports, 629

Wu, 2014, Using relation selection to improve value propagation in a conceptnet-based sentiment dictionary, Knowl.-Based Syst., 69, 100, 10.1016/j.knosys.2014.04.043

Cambria, 2014, Jumping NLP curves: a review of natural language processing research, IEEE Comput. Intell. Mag., 9, 48, 10.1109/MCI.2014.2307227

Wilson, 2005, Recognizing contextual polarity in phrase-level sentiment analysis, 347

Asher, 2009, Appraisal of opinion expressions in discourse, Lingvisticæ Investigationes, 32, 279, 10.1075/li.32.2.10ash

Narayanan, 2009, Sentiment analysis of conditional sentences, 180

Collobert, 2011, Natural language processing (almost) from scratch, J. Mach. Learn. Res., 12, 2493

Kalchbrenner, 2014, A convolutional neural network for modelling sentences, CoRR, abs/1404.2188

Glorot, 2011, Domain adaptation for large-scale sentiment classification: a deep learning approach, 513

Poria, 2015, Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis, 2539

Kim, 2014, Convolutional neural networks for sentence classification, arXiv preprint arXiv:1408.5882

Mikolov, 2013, Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781

Chuang, 2004, Multi-modal emotion recognition from speech and text, Comput. Ling. Chinese Lang.Process., 9, 45

Forbes-Riley, 2004, Predicting emotion in spoken dialogue from multiple knowledge sources., 201

Litman, 2004, Predicting student emotions in computer-human tutoring dialogues, 351

Rigoll, 2005, Speech emotion recognition exploiting acoustic and linguistic information sources, Proc. SPECOM, Patras, Greece, 61

Litman, 2006, Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors, Speech Commun., 48, 559, 10.1016/j.specom.2005.09.008

Seppi, 2008, Patterns, prototypes, performance: classifying emotional user states., 601

Schuller, 2011, Recognizing affect from linguistic information in 3d continuous space, Affective Comput. IEEE Trans., 2, 192, 10.1109/T-AFFC.2011.17

Rozgic, 2012, Speech language & multimedia technol., raytheon bbn technol., cambridge, ma, usa, 1

Savran, 2012, Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering, 485

Sarkar, 2014, Feature analysis for computational personality recognition using youtube personality data set, 11

Alam, 2014, Predicting personality traits using multimodal information, 15

Ellis, 2014, Why we watch the news: a dataset for exploring sentiment in broadcast video news, 104

Siddiquie, 2015, Exploiting multimodal affect and semantics to identify politically persuasive web videos, 203

Poria, 2015, Towards an intelligent framework for multimodal affective data analysis, Neural Netw., 63, 104, 10.1016/j.neunet.2014.10.005

Cai, 2015, Convolutional neural networks for multimedia sentiment analysis, 159

Ji, 2015, Cross-modality sentiment analysis for social multimedia, 28

Yamasaki, 2015, Prediction of user ratings of oral presentations using label relations, 33

Monkaresi, 2012, Classification of affects using head movement, skin color features and physiological signals, 2664

Wang, 2014, Hybrid video emotional tagging using users’ eeg and video content, Multimedia tools Appl., 72, 1257, 10.1007/s11042-013-1450-8

Busso, 2004, Analysis of emotion recognition using facial expressions, speech and multimodal information, 205

Chen, 2005, Visual/acoustic emotion recognition, 1468

Gunes, 2005, Fusing face and body display for bi-modal emotion recognition: single frame analysis and multi-frame post integration, 102

Hoch, 2005, Bimodal fusion of emotional data in an automotive environment, 2, ii

Kapoor, 2005, Multimodal affect recognition in learning environments, 677

Kim, 2007

Wang, 2008, Recognizing human emotional state from audiovisual signals*, Multimedia IEEE Trans., 10, 936, 10.1109/TMM.2008.927665

Zeng, 2005, Multi-stream confidence analysis for audio-visual affect recognition, 964

Gunes, 2005, Affect recognition from face and body: early fusion vs. late fusion, 4, 3437

Pal, 2006, Emotion detection from infant facial expressions and cries, 2

Sebe, 2006, Emotion recognition based on joint visual and audio cues, 1, 1136

Zeng, 2006, Audio-visual emotion recognition in adult attachment interview, 139

Caridakis, 2006, Modeling naturalistic affective states via facial and vocal expressions recognition, 146

D’mello, 2007, Mind and body: dialogue and posture for affect detection in learning environments, Frontiers Artif. Intell.Appl., 158, 161

Gong, 2007, Visual inference of human emotion and behaviour, 22

Han, 2007, A new information fusion method for svm-based robotic audio-visual emotion recognition, 2656

Jong-Tae, 2007, Emotion recognition method based on multimodal sensor fusion algorithm, ISIS, Sokcho-City

Karpouzis, 2007, Modeling naturalistic affective states via facial, vocal, and bodily expressions recognition, 91

Schuller, 2007, Audiovisual recognition of spontaneous interest within conversations, 30

Shan, 2007, Beyond facial expressions: learning human emotion from body gestures., 1

Zeng, 2007, Audio-visual affect recognition, Multimedia IEEE Trans., 9, 424, 10.1109/TMM.2006.886310

Haq, 2008, Audio-visual feature selection and reduction for emotion classification

Kanluan, 2008, Audio-visual emotion recognition using an emotion space concept, 1

Metallinou, 2008, Audio-visual emotion recognition using gaussian mixture models for face and voice, 250

Wimmer, 2008, Low-level fusion of audio, video feature for multi-modal emotion recognition., 145

Bailenson, 2008, Real-time classification of evoked emotions using facial feature tracking and physiological responses, Int.J.Human-Comput.Stud., 66, 303, 10.1016/j.ijhcs.2007.10.011

Castellano, 2008, Emotion recognition through multiple modalities: face, body gesture, speech, 92

Chetty, 2008, A multilevel fusion approach for audiovisual emotion recognition., 115

Emerich, 2009, Emotions recognition by speech and facial expressions analysis, 1617

Gunes, 2009, Automatic temporal segment detection and affect recognition from face and body display, Syst. Man, Cybern. Part B, 39, 64, 10.1109/TSMCB.2008.927269

Haq, 2009, Speaker-dependent audio-visual emotion recognition., 53

Khalili, 2009, Emotion recognition system using brain and peripheral signals: using correlation dimension to improve the results of eeg, 1571

Paleari, 2009, Evidence theory-based multimodal emotion recognition, 435

Rabie, 2009, Evaluation and discussion of multi-modal emotion recognition, 1, 598

D’Mello, 2010, Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features, User Model. User-Adapted Interact., 20, 147, 10.1007/s11257-010-9074-4

Dy, 2010, Multimodal emotion recognition using a spontaneous filipino emotion database, 1

Gajsek, 2010, Multi-modal emotion recognition using canonical correlations and acoustic features, 4133

Kessous, 2010, Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis, J. Multimodal User Interfaces, 3, 33, 10.1007/s12193-009-0025-5

Kim, 2010, Ensemble approaches to parametric decision fusion for bimodal emotion recognition., 460

Mansoorizadeh, 2010, Multimodal information fusion application to human emotion recognition from face and speech, Multimedia Tools Appl., 49, 277, 10.1007/s11042-009-0344-2

Wöllmer, 2010, Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling., 2362

Glodek, 2011, Multiple classifier systems for the classification of audio-visual emotional states, 359

Banda, 2011, Noise analysis in audio-visual emotion recognition, 1

Chanel, 2011, Emotion assessment from physiological signals for adaptation of game difficulty, Syst. Man Cybern. Part A, 41, 1052, 10.1109/TSMCA.2011.2116000

Cueva, 2011, Crawling to improve multimodal emotion detection, 343

Datcu, 2011, Emotion recognition using bimodal data fusion, 122

Jiang, 2011, Audio visual emotion recognition based on triple-stream dynamic bayesian network models, 609

Lingenfelser, 2011, A systematic discussion of fusion techniques for multi-modal affect recognition tasks, 19

Nicolaou, 2011, Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space, Affective Comput. IEEE Trans., 2, 92, 10.1109/T-AFFC.2011.9

Vu, 2011, Emotion recognition based on human gesture and speech information using rt middleware, 787

Wagner, 2011, Exploring fusion methods for multimodal emotion recognition with missing data, Affective Comput. IEEE Trans., 2, 206, 10.1109/T-AFFC.2011.12

Walter, 2011, Multimodal emotion classification in naturalistic user behavior, 603

Hussain, 2012, Combining classifiers in multimodal affect detection, 103

Koelstra, 2012, Deap: a database for emotion analysis; using physiological signals, Affective Comput. IEEE Trans., 3, 18, 10.1109/T-AFFC.2011.15

Lin, 2012, Error weighted semi-coupled hidden markov model for audio-visual emotion recognition, Multimedia, IEEE Trans., 14, 142, 10.1109/TMM.2011.2171334

Lu, 2012, Audio-visual emotion recognition with boosted coupled hmm, 1148

Metallinou, 2012, Context-sensitive learning for enhanced audiovisual emotion classification, Affective Comput. IEEE Trans., 3, 184, 10.1109/T-AFFC.2011.40

Park, 2012, Music-aided affective interaction between human and service robot, EURASIP J. Audio, Speech, Music Process., 2012, 1, 10.1186/1687-4722-2012-5

Rashid, 2013, Human emotion recognition from videos using spatio-temporal and audio features, Visual Comput., 29, 1269, 10.1007/s00371-012-0768-y

Soleymani, 2012, Multimodal emotion recognition in response to videos, Affective Comput. IEEE Trans., 3, 211, 10.1109/T-AFFC.2011.37

Tu, 2012, Bimodal emotion recognition based on speech signals and facial expression, 691

Baltrusaitis, 2013, Dimensional affect recognition using continuous conditional random fields, 1

Dobrišek, 2013, Towards efficient multi-modal emotion recognition, Int. J. Adv. Robotic Sy., 10

Glodek, 2013, Kalman filter based classifier fusion for affective state recognition, 85

Hommel, 2013, Attention and emotion based adaption of dialog systems, 215

Krell, 2013, Fusion of fragmentary classifier decisions for affective state recognition, 116

Wöllmer, 2013, Lstm-modeling of continuous emotions in an audiovisual affect recognition framework, Image Vision Comput., 31, 153, 10.1016/j.imavis.2012.03.001

Chen, 2014, An initial analysis of structured video interviews by using multimodal emotion detection, 1

Poria, 2016, Fusing audio, visual and textual clues for sentiment analysis from multimodal content, Neurocomputing, 174, 50, 10.1016/j.neucom.2015.01.095

Song, 2004, Audio-visual based emotion recognition-a new approach, 2, II

Zeng, 2006, Training combination strategy of multi-stream fused hidden markov model for audio-visual affect recognition, 65

Petridis, 2008, Audiovisual discrimination between laughter and speech, 5117

Atrey, 2010, Multimodal fusion for multimedia analysis: a survey, Multimedia Syst., 16, 345, 10.1007/s00530-010-0182-0

Corradini, 2005, Multimodal input fusion in human-computer interaction, NATO Sci. Ser.s Sub Ser. III Comput.Syst. Sci., 198, 223

Iyengar, 2003, Audio-visual synchrony for detection of monologues in video archives, 1, I

Adams, 2003, Semantic indexing of multimedia content using visual, audio, and text cues, EURASIP J. Adv. Signal Process., 2003, 1, 10.1155/S1110865703211173

Nefian, 2002, Dynamic bayesian networks for audio-visual speech recognition, EURASIP J. Adv. Signal Process., 2002, 1, 10.1155/S1110865702206083

Nickel, 2005, A joint particle filter for audio-visual speaker tracking, 61

Potamitis, 2004, Tracking of multiple moving speakers with multiple microphone arrays, Speech Audio Process. IEEE Trans., 12, 520, 10.1109/TSA.2004.833004

Gunes, 2010, Dimensional emotion prediction from spontaneous head gestures for interaction with sensitive artificial listeners, 371

Valstar, 2015, Fera 2015-second facial expression recognition and analysis challenge, 6, 1

Nicolaou, 2010, Automatic segmentation of spontaneous data using dimensional labels from multiple coders, 43

Chang, 2011, Ammon: a speech analysis library for analyzing affect, stress, and mental health on mobile phones, Proc. PhoneSense, 2011

Zhang, 2012, Audio-visual emotion recognition based on facial expression and affective speech, 46

Eyben, 2011, String-based audiovisual fusion of behavioural events for the assessment of dimensional affect, 322

Rahman, 2012, A personalized emotion recognition system using an unsupervised feature adaptation scheme, 5117

Jin, 2015, Speech emotion recognition with acoustic and lexical features, 4749

Rozgić, 2012, Ensemble of svm trees for multimodal emotion recognition, 1

DeVault, 2014, Simsensei kiosk: a virtual human interviewer for healthcare decision support, 1061

Hoque, 2011, Acted vs. natural frustration and delight: many people smile in natural frustration, 354