Adaptive training for robust ASR

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. - Trang 15-20

M.J.F. Gales¹

¹Cambridge University Engineering Department, Cambridge, UK

Tóm tắt

Adaptive training is a powerful training technique for building speech recognition systems on nonhomogeneous data. The aim is to remove unwanted variability, such as changes in speaker, channel or acoustic environment, from desired changes, the acoustic differences between words. During training, two sets of models are generated: a canonical model set for the desired "true" variability of the speech data, and a set of transforms to represent the unwanted variability. The canonical model set trained in this fashion should be more "amenable" to being adapted to a particular target condition and more "compact". During recognition, a transform to the target domain is trained. This target specific transform is then used with the canonical model set in the recognition process. The paper gives an overview of the underlying theory and assumptions used in adaptive training. Furthermore, the use of adaptive training schemes in current state-of-the-art tasks is described, together with a discussion of how such schemes may be used in the future.

Từ khóa

#Robustness #Automatic speech recognition #Loudspeakers #Speech recognition #Training data #Target recognition #Feature extraction #Acoustical engineering #Data engineering #Power engineering and energy

Tài liệu tham khảo

10.1109/ASRU.2001.1034593 10.1109/ICASSP.1997.596120 10.1109/TASSP.1979.1163209 10.1109/89.326616 10.1109/89.496215 10.1109/ICASSP.2001.940844 gales, 2001, Multiple-cluster adaptive training schemes for speech recognition, Proceedings ICASSP, 233 gales, 1998, Cluster adaptive training for speech recognition, Proceedings ICSLP, 1783 chou, 1999, Maximum a-posterior linear regression with elliptical symmetic matrix variate priors, Proceedings EUROSPEECH, 1 chesta, 1999, Maximum a-posterior linear regression for hidden markov models, Proceedings EUROSPEECH, 10.21437/Eurospeech.1999-56 10.1006/csla.1995.0010 10.1006/csla.1998.0043 zweig, 1999, Speech Recognition With Dynamic Bayesian Networks 10.1109/89.848223 10.1109/ICSLP.1996.607806 dempster, 1977, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, 39, 1 10.1109/ICASSP.1996.541105 gales, 1996, The generation and use of regression class trees for MLLR adaptation, Tech Rep CVED/FINFENG/TR263 Cambridge University 10.1109/ICSLP.1996.607807 padmanabhan, 2000, Lattice-based unsupervised Mllr for speaker adaptation, Proc ISCA ITRW ASR2000, 128 ljolje, 2001, The AT&T LVCSR-2001 system, Large Vocabulary Conversational Speech Recognition woodland, 2000, Very large scale MMIE training for conversational telephone speech recognition, Proc 2000 Speech Transcription Workshop 10.1016/S0167-6393(98)00029-6 10.1109/ICASSP.2001.940764 moreno, 1996, Speech recognition in noisy environments

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA