Car noise verification and applications
Tóm tắt
This study presents audio based vehicle-verification as a new area of research. The task involves verifying the claim that an acoustic sample belongs to a vehicle. Audio based vehicle verification has the potential to impact research in the areas of vehicle forensics and in-vehicle speech systems. For this task, a new corpus (UTD-CAR-NOISE) that consists of noise from 20 vehicles under 8 distinct noise environments (∼8 hours of data). Our approach towards vehicle verification hypothesizes that some specific environments are more suited for vehicle verification. Towards this goal, four diverse in-vehicle noise conditions are identified on the basis of their frequency of occurrence. Additionally, four different verification systems are proposed based on their complexity and modeling strategies. Our evaluation shows that A/C on with windows closed condition is the most conducive for vehicle verification (98 %). The proposed systems were evaluated on approximately 100,000 trials, achieving performances in the range of (75–98 %) for different vehicle environments.
Tài liệu tham khảo
Akbacak, M., & Hansen, J. H. L. (2007). Environmental sniffing: noise knowledge estimation for robust speech systems. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 465–477.
Alexandre, P., & Lockwood, P. (1993). Root cepstral analysis: a unified view. Application to speech processing in car noise environments. Speech Communication, 3, 277–288.
Arslan, L., McCree, A., & Viswanathan, V. (1995). New methods for adaptive noise suppression. In ICASSP-95, Detroit, MI, USA (Vol. 1, pp. 812–815).
Ban, Y., Banno, H., Takeda, K., & Itakura, F. (2002). Synthesis of car noise based on a composition of engine noise and friction noise. In ICASSP-02, Orlando, USA (Vol. 2, pp. 2105–2108).
Camacho, A., Pinero, G., De Diego, M., & Gonzalez, A. (2008). Exploring roughness perception in car engine noises through complex cepstrum analysis. Acta Acustica, 94, 130–140.
Degan, N. D., & Prati, C. (1988). Acoustic noise analysis and speech enhancement techniques for mobile radio applications. Signal Processing, 15(1), 43–56.
El-Maleh, K., Samouelian, A., & Kabal, P. (1999). Frame level noise classification in mobile environments. In ICASSP-99, Phoenix, USA (pp. 237–240).
Grenier, Y. (1992). A microphone array for car environments. In ICASSP-92, San Francisco, USA (Vol. 1, pp. 305–308).
Hansen, J., & Clements, M. (1991). Constrained iterative speech enhancement with application to speech recognition. IEEE Transactions on Signal Processing, 39, 795–805.
Hansen, J. H. L. (2002). Getting started with the CU-Move corpus. Robust Speech Processing Group (RSPG-CSLR).
Hansen, J. H. L., & Varadarajan, V. (2009). Analysis and compensation of lombard speech across noise type and levels with application to in-set/out-of-set speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 17(2), 366–378.
Hansen, J. H. L., Plucienkowski, J., Gallant, S., Pellom, B., & Ward, W. (2000). CU-MOVE: robust speech processing for in-vehicle speech systems. In ICSLP-00, Beijing, China (Vol. 1, pp. 524–527).
Horswill, M. S., & Plooy, A. M. (2008). Auditory feedback influences perceived driving speeds. Perception, 37(7), 1037–1043.
Ishimitsu, S., Takami, K., Nakagawa, S., & Soeta, Y. (2012). Sound quality evaluation of car interior noise using brain magnetic field. Watermark 1.
Kates, J. M. (1995). Classification of background noises for hearing aid applications. The Journal of the Acoustical Society of America, 97, 461–470.
Kawaguchi, N., Matsubara, S., Iwa, H., Kajita, H., Takeda, K., Itakura, F., & Inagaki, F. (2000). Construction of speech corpus in moving car environment. In ICSLP-00, Beijing, China (Vol. 3, pp. 362–365).
Kellermann, W. (1997). Strategies for combining acoustic echo cancellation and adaptive beamforming microphone arrays. In ICASSP-97, Munich, Germany (Vol. 1, pp. 219–222).
Kim, W., & Hansen, J. H. L. (2007). Feature compensation employing model combination for robust speech recognition in in-vehicle environments. In DSP for in-vehicle and mobile systems, Istanbul, Turkey.
Kitzen, W. J., Kemna, J. W., Druyvesteyn, W. F., Knibbeler, C. L. & van de Voort, A. T. (1988). Noise-dependent sound reproduction in a car: application of a digital audio signal processor. Journal of the Audio Engineering Society, 36(1/2), 18–26.
Krishnamurthy, N., & Hansen, J. H. L. (1990). Trainable noise subtraction filters for speech enhancement in car. In Fifth European signal processing conference, Barcelona, Spain (pp. 1111–1114).
Krishnamurthy, N., & Hansen, J. H. L. (2006). Noise update modeling for speech enhancement: when do we do enough? In Interspeech-06, Pittsburgh, USA (pp. 1431–1434).
Krishnamurthy, N., Lubag, R., & Hansen, J. (2012). In-vehicle speech and noise corpora. New York: Springer.
Lecomte, I., Boudy, J., & Tassy, A. (1989). Car noise processing for speech input. In ICASSP-89, Glasgow, UK (pp. 512–515).
Leonard, M., & Hansen, J. H. L. (2008). In-set/out-of-set speaker recognition: leveraging the speaker and noise balance. In ICASSP-08, Las Vegas, USA (pp. 1585–1588).
Li, H., Zhao, Q., & Wen, B. (2012). Identification of the vehicle noise source by sound intensity method. Advanced Materials Research, 346, 634–638.
Lockwood, P., & Boudy, J. (1992). Experiments with a nonlinear spectral subtractor (nss), hidden Markov models and the projection, for robust speech recognition in cars. Speech Communication, 11(2–3), 215–228.
Ma, L., Smith, D., & Milner, B. (2003). Environmental noise classification for context-aware applications. In Lecture notes in computer science. Database and expert systems applications (pp. 360–370).
Martin, R., & Vary, P. (1992). A symmetric two microphone speech enhancement system theoretical limits and application in a car environment. In The digital signal processing workshop (pp. 4.5.1–4.5.2).
Meyer, J., & Simmer, K. U. (1997). Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction. In ICASSP-97, Munich, Germany (Vol. 2, pp. 1167–1170).
Mokbel, C., & Chollet, G. F. A. (1995). Automatic word recognition in cars. IEEE Transactions on Speech and Audio Processing, 3(5), 346–356.
Ruehl, H., Dobler, S., Weith, J., Meyer, P., Noll, A., Hamer, H., & Piotrowski, H. (1991). Speech recognition in the noisy car environment. Speech Communication, 10(1), 11–22.
Sameti, H., Sheikhzadeh, H., Deng, L. L., & Brennan, R. (1998). HMM-based strategies for enhancement of speech signals embedded in nonstationary noise. IEEE Transactions on Speech and Audio Processing, 6, 445–455.
Sangwan, A., Krishnamurthy, N., & Hansen, J. H. L. (2008). Environmentally aware voice activity detector. In Interspeech-08, Antwerp, Belgium (pp. 2929–2932).
Srinivasan, S., Samuelsson, J., & Kleijn, W. (2007). Codebook-based Bayesian speech enhancement for nonstationary environments. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 441–452.
Taghia, J., Taghia, J., Mohammadiha, N., Sang, J., Bouse, V., & Martin, R. (2011). An evaluation of noise power spectral density estimation algorithms in adverse acoustic environments (pp. 4640–4643).
Takeda, K., Hansen, J. H. L., Boyraz, P., Abut, H., Malta, L., & Miyajima, C. (2011). An international large-scale vehicle corpora for research on driver behavior on the road. IEEE Transactions on Intelligent Transportation Systems.
Trainham, J. (2005). Quieter rides. Automotive Engineering International, 113, 83.
Xu, H., Dalsgaard, P., Tan, Z., & Lindberg, B. (2006). Robust speech recognition from noise-type based feature compensation and model interpolation in a multiple model framework. In ICASSP-06, Toulouse, France (Vol. 1, pp. 1141–1144).
Xu, H., Dalsgaard, P., Tan, Z., & Lindberg, B. (2007). Noise condition-dependent training based on noise classification and SNR estimation. IEEE Transactions on Audio, Speech, and Language Processing, 15(8), 2431–2443.
Zhang, X., & Hansen, J. H. L. (2003). CSA-BF: a constraint switched adaptive beamformer for speech enhancement and recognition in real car environments. IEEE Transactions on Audio, Speech, and Language Processing, 11(6), 733–745.