Histogram based normalization in the acoustic feature space
Tóm tắt
We describe a technique called histogram normalization that aims at normalizing feature space distributions at different stages in the signal analysis front-end, namely the log-compressed filterbank vectors, cepstrum coefficients, and LDA (local density approximation) transformed acoustic vectors. Best results are obtained at the filterbank, and in most cases there is a minor additional gain when normalization is applied sequentially at different stages. We show that histogram normalization performs best if applied both in training and recognition, and that smoothing the target histogram obtained on the training data is also helpful. On the VerbMobil II corpus, a German large-vocabulary conversational speech recognition task, we achieve an overall reduction in word error rate of about 10% relative.
Từ khóa
#Histograms #Filter bank #Signal analysis #Cepstrum #Linear discriminant analysis #Target recognition #Smoothing methods #Training data #Speech recognition #Error analysisTài liệu tham khảo
hilger, 2001, Quantile Based Histogram Equalization for Noise Robust Speech Recognition, Proc European Conf on Speech Communication and Technology
gopinath, 2000, Gaussianization, IMA Workshop Mathematical Foundations of Speech Processing and Recognition
10.1109/ICASSP.2000.862071
10.1006/csla.1995.0010
10.1109/ICASSP.1996.541105
mirghafori, 1995, Fast Speakers in Large Vocabulary Continuous Speech Recognition: Analysis & Antidotes, Proc European Conf on Speech Communication and Technology, 491
10.3115/1075671.1075688
10.1109/ICASSP.1996.541103
10.1109/ICASSP.1998.675399
dharanipragada, 2000, UA Nonlinear Unsupervised Adaptation Technique for Speech Recognition, Proc Int Conf on Spoken Language Processing, 556
