Maximum-likelihood training of the PLCG-based language model

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. - Trang 210-213

D.H. Van Uytsel¹, D. Van Compernolle¹, P. Wambacq¹

¹ESAT/PSI, Katholieke Universiteit Leuven, Belgium

Tóm tắt

In Van Uytsel et al. (2001) a parsing language model based on a probabilistic left-comer grammar (PLCG) was proposed and encouraging performance on a speech recognition task using the PLCG-based language model was reported. In this paper we show how the PLCG-based language model can be further optimized by iterative parameter reestimation on unannotated training data. The precalculation of forward, inner and outer probabilities of states in the PLCG network provides an elegant crosscut to the computation of transition frequency expectations, which are needed in each iteration of the proposed reestimation procedure. The training algorithm enables model training on very large corpora. In our experiments, test set perplexity is close to saturation after three iterations, 5 to 16% lower than initially. We however observed no significant improvement of recognition accuracy after reestimation.

Từ khóa

#Natural languages #Speech recognition #Maximum likelihood estimation #Training data #Testing #Computer networks #Iterative algorithms #Large-scale systems #Stochastic processes #Predictive models

Tài liệu tham khảo

van aelten, 2000, Inside-outside reestimation of Chelba-Jelinek models, Technical Report L&H-SR-00-027 chelba, 2000, Exploiting syntactic structure for natural language modeling chamiak, 2000, A maximum-entropy inspired parser, In Proc NAACL-2000, 132 dempster, 1977, Maximum likelihood from incomplete data via the EM algorithm, J Royal Statistical Society Series B, 39, 1 10.1109/TASSP.1987.1165125 bod, 2000, Combining semantic and syntactic structure for language modeling, Proc ICSLP-2000, iii, 106 10.1109/SWAT.1970.5 manning, 1997, Probabilistic parsing using left corner language models, Proc 1WPT-1997, 147 van uytsel, 2000, Earley-inspired parsing language model: Background and preliminaries, Internal Report PSI-SPCH-00-1 Klf leuven ESAT marcus, 1995, Building a large annotated corpus of English: the Penn Treebank, Computational Linguistics, 19, 313 10.3115/1073336.1073365

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA