Adaptive training for robust ASR
Tóm tắt
Adaptive training is a powerful training technique for building speech recognition systems on nonhomogeneous data. The aim is to remove unwanted variability, such as changes in speaker, channel or acoustic environment, from desired changes, the acoustic differences between words. During training, two sets of models are generated: a canonical model set for the desired "true" variability of the speech data, and a set of transforms to represent the unwanted variability. The canonical model set trained in this fashion should be more "amenable" to being adapted to a particular target condition and more "compact". During recognition, a transform to the target domain is trained. This target specific transform is then used with the canonical model set in the recognition process. The paper gives an overview of the underlying theory and assumptions used in adaptive training. Furthermore, the use of adaptive training schemes in current state-of-the-art tasks is described, together with a discussion of how such schemes may be used in the future.
Từ khóa
#Robustness #Automatic speech recognition #Loudspeakers #Speech recognition #Training data #Target recognition #Feature extraction #Acoustical engineering #Data engineering #Power engineering and energyTài liệu tham khảo
10.1109/ASRU.2001.1034593
10.1109/ICASSP.1997.596120
10.1109/TASSP.1979.1163209
10.1109/89.326616
10.1109/89.496215
10.1109/ICASSP.2001.940844
gales, 2001, Multiple-cluster adaptive training schemes for speech recognition, Proceedings ICASSP, 233
gales, 1998, Cluster adaptive training for speech recognition, Proceedings ICSLP, 1783
chou, 1999, Maximum a-posterior linear regression with elliptical symmetic matrix variate priors, Proceedings EUROSPEECH, 1
chesta, 1999, Maximum a-posterior linear regression for hidden markov models, Proceedings EUROSPEECH, 10.21437/Eurospeech.1999-56
10.1006/csla.1995.0010
10.1006/csla.1998.0043
zweig, 1999, Speech Recognition With Dynamic Bayesian Networks
10.1109/89.848223
10.1109/ICSLP.1996.607806
dempster, 1977, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, 39, 1
10.1109/ICASSP.1996.541105
gales, 1996, The generation and use of regression class trees for MLLR adaptation, Tech Rep CVED/FINFENG/TR263 Cambridge University
10.1109/ICSLP.1996.607807
padmanabhan, 2000, Lattice-based unsupervised Mllr for speaker adaptation, Proc ISCA ITRW ASR2000, 128
ljolje, 2001, The AT&T LVCSR-2001 system, Large Vocabulary Conversational Speech Recognition
woodland, 2000, Very large scale MMIE training for conversational telephone speech recognition, Proc 2000 Speech Transcription Workshop
10.1016/S0167-6393(98)00029-6
10.1109/ICASSP.2001.940764
moreno, 1996, Speech recognition in noisy environments