Deep features-based speech emotion recognition for smart affective services

Multimedia Tools and Applications - Tập 78 Số 5 - Trang 5571-5589 - 2019
Abdul Malik Badshah1, Nasir Rahim1, Noor Ullah1, Jamil Ahmad1, Mi Young Lee1, Soonil Kwon1
1Digital Contents Research Institute, Sejong University, Seoul, Republic of Korea#TAB#

Tóm tắt

Từ khóa


Tài liệu tham khảo

Abdelgawad H, Shalaby A, Abdulhai B, Gutub AAA (2014) Microscopic modeling of large-scale pedestrian–vehicle conflicts in the city of Madinah, Saudi Arabia. J Adv Transp 48:507–525

Ahmad J, Muhammad K, Kwon S-I, Baik SW, Rho S (2016) Dempster-Shafer Fusion Based Gender Recognition for Speech Analysis Applications. In: Platform Technology and Service (PlatCon), 2016 International Conference on, pp 1–4

Ahmad J, Sajjad M, Rho S, Kwon S-I, Lee MY, Baik SW (2016) Determining speaker attributes from stress-affected speech in emergency situations with hybrid SVM-DNN architecture. Multimed Tools Appl 1–25. https://doi.org/10.1007/s11042-016-4041-7

Ahmad J, Fiaz M, Kwon S-I, Sodanil M, Vo B, Baik SW (2016) Gender Identification using MFCC for Telephone Applications-A Comparative Study. International Journal of Computer Science and Electronics Engineering 3.5 (2015):351–355

Aly SA, AlGhamdi TA, Salim M, Amin HH, Gutub AA (2014) Information Gathering Schemes For Collaborative Sensor Devices. Procedia Comput Sci 32:1141–1146

Badshah AM, Ahmad J, Rahim N, Baik SW (2017) Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network. In: Platform Technology and Service (PlatCon), 2017 International Conference on, pp 1–5

Banse R, Scherer KR (1996) Acoustic profiles in vocal emotion expression. J Pers Soc Psychol 70:614

Bengio Y, Courville A, Vincent P (2013) Representation learning: A review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35:1798–1828

Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005) A database of German emotional speech. In: Interspeech, pp 1517–1520

Curtis S, Zafar B, Gutub A, Manocha D (2013) Right of way. Vis Comput 29:1277–1292

Deng L, Seltzer ML, Yu D, Acero A, Mohamed A-R, Hinton GE (2010) Binary coding of speech spectrograms using a deep auto-encoder. In: Interspeech, pp 1692–1695

Deng J, Zhang Z, Marchi E, Schuller B (2013) Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on, pp 511–516

Dennis J, Tran HD, Li H (2011) Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Process Lett 18:130–133

El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn 44:572–587

Engberg IS, Hansen AV, Andersen O, Dalsgaard P (1997) Design, recording and verification of a danish emotional speech database. In: Eurospeech

Eyben F, Wöllmer M, Schuller B (2009) OpenEAR—introducing the Munich open-source emotion and affect recognition toolkit. In: Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on, pp 1–6

France DJ, Shiavi RG, Silverman S, Silverman M, Wilkes M (2000) Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans Biomed Eng 47:829–837

Gharavian D, Sheikhan M, Nazerieh A, Garoucy S (2012) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput & Applic 21:2115–2126

Guo Z, Wang ZJ (2013) An unsupervised hierarchical feature learning framework for one-shot image recognition. IEEE Trans Multimedia 15:621–632

Gutub A, Alharthi N (2011) Improving Hajj and Umrah Services Utilizing Exploratory Data Visualization Techniques. Inf Vis 10:356–371

Guven E, Bock P (2010) Speech emotion recognition using a backward context. In: Applied Imagery Pattern Recognition Workshop (AIPR), 2010 I.E. 39th, pp 1–5

Haq S, Jackson PJ, Edge J (2009) Speaker-dependent audio-visual emotion recognition. In: AVSP, pp 53–58

Hu H, Xu M-X, Wu W (2007) Fusion of global statistical and segmental spectral features for speech emotion recognition. In: INTERSPEECH, pp 2269–2272

Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R et al (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 675–678

Kaysi I, Sayour M, Alshalalfah B, Gutub A (2012) Rapid transit service in the unique context of Holy Makkah: assessing the first year of operation during the 2010 pilgrimage season. Urban Transp XVIII Urban Transp Environ 21st Century 18:253

Kaysi I, Alshalalfah B, Shalaby A, Sayegh A, Sayour M, Gutub A (2013) Users' Evaluation of Rail Systems in Mass Events: Case Study in Mecca, Saudi Arabia. Transp Res Rec J Transp Res Board 2350:111–118

Khan MK, Zakariah M, Malik H, Choo K-KR (2017) A novel audio forensic data-set for digital multimedia forensics. Aust J Forensic Sci 1–18. http://doi.org/10.1080/00450618.2017.1296186

Kim S, Guy SJ, Hillesland K, Zafar B, Gutub AA-A, Manocha D (2015) Velocity-based modeling of physical interactions in dense crowds. Vis Comput 31:541–555

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

Krothapalli SR, Koolagudi SG (2013) Emotion recognition using vocal tract information. In: Emotion Recognition Using Speech Features, ed. Springer, pp 67–78

Liu P, Choo K-KR, Wang L, Huang F (2016) SVM or deep learning? A comparative study on remote sensing image classification. Soft Comput 1–13. https://doi.org/10.1007/s00500-016-2247-2

Lugger M, Janoir M-E, Yang B (2009) Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In: Signal Processing Conference, 2009 17th European, pp 1225–1229

Mao Q, Wang X, Zhan Y (2010) Speech emotion recognition method based on improved decision tree and layered feature selection. Int J Humanoid Rob 7:245–261

Mao Q, Dong M, Huang Z, Zhan Y (2014) Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans Multimedia 16:2203–2213

Morrison D, Wang R, De Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Comm 49:98–112

Nanda A, Sa PK, Choudhury SK, Bakshi S, Majhi B (2017) A Neuromorphic Person Re-Identification Framework for Video Surveillance. IEEE Access 5:6471–6482

Pao T-L, Chen Y-T, Yeh J-H, Cheng Y-M, Lin Y-Y (2007) A comparative study of different weighting schemes on KNN-based emotion recognition in Mandarin speech. Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues, pp 997–1005

Ramakrishnan S, El Emary IM (2013) Speech emotion recognition approaches in human computer interaction. Telecommun Syst 52(3):1467–1478

Raman R, Sa PK, Majhi B, Bakshi S (2016) Direction Estimation for Pedestrian Monitoring System in Smart Cities: An HMM Based Approach. IEEE Access 4:5788–5808

Rao KS, Koolagudi SG, Vempada RR (2013) Emotion recognition from speech using global and local prosodic features. Int Journal Speech Technol 16:143–160

Rout JK, Choo K-KR, Dash AK, Bakshi S, Jena SK, Williams KL (2017) A model for sentiment and emotion analysis of unstructured social media text. Electron Commer Res 1–19. https://doi.org/10.1007/s10660-017-9257-8

Schmidt EM, Kim YE (2011) Learning emotion-based acoustic features with deep belief networks. In: Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 I.E. Workshop on, pp 65–68

Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP'04). IEEE International Conference on, pp I-577

Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (1929-1958) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:2014

Stuhlsatz A, Meyer C, Eyben F, Zielke T, Meier G, Schuller B (2011) Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: Acoustics, Speech and Signal Processing (ICASSP), 2011 I.E. International Conference on, pp 5688–5691

Sun R, Moore E (2011) Investigating glottal parameters and teager energy operators in emotion recognition. Affective computing and intelligent interaction, pp 425-434

Ververidis D, Kotropoulos C (2006) Emotional speech recognition: Resources, features, and methods. Speech Comm 48:1162–1181

Wöllmer M, Metallinou A, Katsamanis N, Schuller B, Narayanan S (2012) Analyzing the memory of BLSTM neural networks for enhanced emotion classification in dyadic spoken interactions. In: Acoustics, Speech and Signal Processing (ICASSP), 2012 I.E. International Conference on, pp 4157–4160

Xia M, Lijiang C (2010) Speech emotion recognition based on parametric filter and fractal dimension. IEICE Trans Inf Syst 93:2324–2326

Xu Z, Luo X, Liu Y, Choo K-KR, Sugumaran V, Yen N et al (2016) From latency, through outbreak, to decline: detecting different states of emergency events using web resources. IEEE Trans Big Data PP:1 https://doi.org/10.1109/TBDATA.2016.2599935

Yen N, Zhang H, Wei X, Lu Z, Choo K-KR, Mei L et al (2017) Social Sensors Based Online Attention Computing of Public Safety Events. IEEE Trans Emerg Top Comput 5(3):403–411

Yu D, Seltzer ML, Li J, Huang J-T, Seide F (2013) Feature learning in deep neural networks-studies on speech recognition tasks. Published at ICLR 2013. https://sites.google.com/site/representationlearning2013/

Yun S, Yoo CD (2012) Loss-scaled large-margin Gaussian mixture models for speech emotion classification. IEEE Trans Audio Speech Lang Process 20:585–598

(2017, 4–5-2017). NVIDIA/DIGITS. Available: https://github.com/NVIDIA/DIGITS