Adaptive training for robust ASR

M.J.F. Gales1
1Cambridge University Engineering Department, Cambridge, UK

Tóm tắt

Adaptive training is a powerful training technique for building speech recognition systems on nonhomogeneous data. The aim is to remove unwanted variability, such as changes in speaker, channel or acoustic environment, from desired changes, the acoustic differences between words. During training, two sets of models are generated: a canonical model set for the desired "true" variability of the speech data, and a set of transforms to represent the unwanted variability. The canonical model set trained in this fashion should be more "amenable" to being adapted to a particular target condition and more "compact". During recognition, a transform to the target domain is trained. This target specific transform is then used with the canonical model set in the recognition process. The paper gives an overview of the underlying theory and assumptions used in adaptive training. Furthermore, the use of adaptive training schemes in current state-of-the-art tasks is described, together with a discussion of how such schemes may be used in the future.

Từ khóa

#Robustness #Automatic speech recognition #Loudspeakers #Speech recognition #Training data #Target recognition #Feature extraction #Acoustical engineering #Data engineering #Power engineering and energy

Tài liệu tham khảo

10.1109/ASRU.2001.1034593 10.1109/ICASSP.1997.596120 10.1109/TASSP.1979.1163209 10.1109/89.326616 10.1109/89.496215 10.1109/ICASSP.2001.940844 gales, 2001, Multiple-cluster adaptive training schemes for speech recognition, Proceedings ICASSP, 233 gales, 1998, Cluster adaptive training for speech recognition, Proceedings ICSLP, 1783 chou, 1999, Maximum a-posterior linear regression with elliptical symmetic matrix variate priors, Proceedings EUROSPEECH, 1 chesta, 1999, Maximum a-posterior linear regression for hidden markov models, Proceedings EUROSPEECH, 10.21437/Eurospeech.1999-56 10.1006/csla.1995.0010 10.1006/csla.1998.0043 zweig, 1999, Speech Recognition With Dynamic Bayesian Networks 10.1109/89.848223 10.1109/ICSLP.1996.607806 dempster, 1977, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, 39, 1 10.1109/ICASSP.1996.541105 gales, 1996, The generation and use of regression class trees for MLLR adaptation, Tech Rep CVED/FINFENG/TR263 Cambridge University 10.1109/ICSLP.1996.607807 padmanabhan, 2000, Lattice-based unsupervised Mllr for speaker adaptation, Proc ISCA ITRW ASR2000, 128 ljolje, 2001, The AT&T LVCSR-2001 system, Large Vocabulary Conversational Speech Recognition woodland, 2000, Very large scale MMIE training for conversational telephone speech recognition, Proc 2000 Speech Transcription Workshop 10.1016/S0167-6393(98)00029-6 10.1109/ICASSP.2001.940764 moreno, 1996, Speech recognition in noisy environments