Multiclass support vector machines for environmental sounds classification in visual domain based on log-Gabor filters

International Journal of Speech Technology - Tập 16 - Trang 203-213 - 2012
Souli Sameh1,2, Zied Lachiri2
1Signal, Image and Pattern Recognition Research Unit, Dept. of Genie Electrique, ENIT, Le Belvédère, Tunisia
2Dept. of Physique and Instrumentation, INSAT, Centre Urbain, Tunisia

Tóm tắt

This paper presents an approach aimed at recognizing environmental sounds for surveillance and security applications. We propose a robust environmental sound classification approach, based on spectrograms features derive from log-Gabor filters. This approach includes three methods. In the first two methods, the spectrograms are passed through an appropriate log-Gabor filter banks and the outputs are averaged and underwent an optimal feature selection procedure based on a mutual information criteria. The third method uses the same steps but applied only to three patches extracted from each spectrogram. To investigate the accuracy of the proposed methods, we conduct experiments using a large database containing 10 environmental sound classes. The classification results based on Multiclass Support Vector Machines show that the second method is the most efficient with an average classification accuracy of 89.62 %.

Tài liệu tham khảo

Chu, S., Narayanan, S., & Kuo, C. C. J. (2009). Environmental sound recognition with time-frequency audio features. IEEE Transactions on Audio, Speech, and Language Processing, 17, 1142–1158. Dennis, J., Tran, H. D., & Li, H. (2011). Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Processing Letters, 18, 130–133. Ezzat, T., Bouvrie, J., & Poggio, T. (2007). Spectro-temporal analysis of speech using 2-d Gabor filters. In Proc. interspeech (pp. 1–4). He, L., Lech, M., Maddage, N., & Allen, N. (2009a). Stress and emotion recognition using log-Gabor filter. In Proc. of 3rd international conference on affective computing and intelligent interaction and workshops, ACII, Amsterdam (pp. 1–6). He, L., Lech, M., Maddage, N. C., & Allen, N. (2009b). Stress detection using speech spectrograms and sigma-pi neuron units. In Proc. of int. conf. on natural computation (pp. 260–264). Hsu, C.-W., & Lin, C.-J. (2002). A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13, 415–425. Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2009). A practical guide to support vector classification. Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan. Available: www.csie.ntu.edu.tw/~cjlin/. Kleinschmidt, M. (2002). Methods for capturing spectro-temporal modulations in automatic speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 88, 416–422. Kleinschmidt, M. (2003). Localized spectro-temporal features for automatic speech recognition. In Proc. Eurospeech (pp. 2573–2576). Kwak, N., & Choi, C. (2002). Input feature selection for classification problems. IEEE Transactions on Neural Networks, 13, 143–159. Kuncheva, L. I. (2004). Combining pattern classifiers methods and algorithms. New York: Wiley. ISBN 0-471-21078-1. Lamper, T. A., & O’Keefe, S. E. M. (2010). A survey of spectrogram track detection algorithms. Applied Acoustics, 71, 87–100. Leonardo Software website. http://www.leonardosoft.com. Mallat, S. (1999). A wavelet tour of signal processing (2nd edn.). San Diego: Academic Press. Mallat, S., & Peyré, G. (2007). A review of bandelet methods for geometrical image representation. Numerical Algorithms, 44, 205–234. Rabaoui, A., Davy, M., Rossignol, S., & Ellouze, N. (2008). Using one-class SVMs and wavelets for audio surveillance. IEEE Transactions on Information Forensics and Security, 3, 763–775. Scholkopf, B., & Smola, A. (2001). Learning with kernels. Cambridge: MIT Press. Schulz-Mir, H., Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., & Poggio, T. (2007). Robust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 411–426. Souli, S., & Lachiri, Z. (2011). Environmental sounds classification based on visual features. In Lecture notes on computer science: Vol. 7042. Proc. of CIARP, Chile (pp. 459–466). Berlin: Springer. Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10, 988–999. Vapnik, V., & Chapelle, O. (2000). Bounds on error expectation for support vector machines. Neural Computation, 12, 2013–2036. Wang, J.-C., Lee, H.-P., Wang, J.-F., & Lin, C.-B. (2008). Robust environmental sound recognition for home automation. IEEE Transactions on Automation Science and Engineering, 5, 25–31. Xinyi, Z., Jianxiao, Y., & Qiang, H. (2009). Research of STRAIGHT spectrogram and difference subspace algorithm for speech recognition. In Int. congress on image and signal processing (CISP) (pp. 1–4). Yu, G., & Slotine, J. J. (2008). Fast wavelet-based visual classification. In Proc. of IEEE international conference on pattern recognition, ICPR, Tampa (pp. 1–5). Yu, G., & Slotine, J. J. (2009). Audio classification from time-frequency texture. In Proc. IEEE ICASSP, Taipei (pp. 1677–1680). Yu, G., Mallat, S., & Bacry, E. (2008). Audio denoising by time-frequency block thresholding. IEEE Transactions on Signal Processing, 56, 1830–1839.