A comparative study of model-based adaptation techniques for a compact speech recognizer
Tóm tắt
Many techniques for speaker adaptation have been successfully applied to automatic speech recognition. This paper compares the performance of several adaptation methods with respect to their memory need and processing demand. For adaptation of a compact acoustic model with 4k densities, eigenvoices and structural MAP (SMAP) are investigated next to the well-known techniques of MAP (maximum a posteriori) and MLLR (maximum likelihood linear regression) adaptation. Experimental results are reported for unsupervised on-line adaptation on different amounts of adaptation data ranging from 4 to 500 words per speaker. The results show that for small amounts of adaptation data it might be more efficient to employ a larger baseline acoustic model without adaptation. Eigenvoices achieve the lowest word error rates of all adaptation techniques but SMAP presents a good compromise between memory requirement and accuracy.
Từ khóa
#Adaptation model #Speech recognition #Loudspeakers #Automatic speech recognition #Maximum likelihood linear regression #Laboratories #Error analysis #Command and control systems #Degradation #Regression tree analysisTài liệu tham khảo
huang, 1991, Im-proved Acoustic Modeling with the SPHINX Speech Recognition System, Proc ICASSP, 1, 345
gao, 1997, Speaker Adaptation Based on Pre-Clustering Training Speakers, 5th Eurospeech, 4, 2091, 10.21437/Eurospeech.1997-553
10.1109/ICASSP.2001.940840
10.1109/89.279278
10.1109/89.906001
yamaguchi, 1994, Speaker-Consistent Parsing for Speaker-Independent Continuous Speech Recognition, Proc ICSLP, 2, 791
siohan, 2000, Structural Maximum A Posteriori Linear Regression for Fast HMM Adaptation, Proc ISCA ITRW ASR2000 Automatic Speech Recognition Challenges for the Next Millenium, 120
10.1109/ICASSP.1999.759781
10.1006/csla.1995.0010
10.1109/ICASSP.1999.759778
kuhn, 1998, Eigenvoices for speaker adaptation, Proc ICSLP, 5, 1771
10.1109/89.650310