Large vocabulary continuous speech recognition of Broadcast News – The Philips/RWTH approach

Speech Communication - Tập 37 - Trang 109-131 - 2002
P. Beyerlein1, X. Aubert1, R. Haeb-Umbach1, M. Harris1, D. Klakow1, A. Wendemuth1, S. Molau2, H. Ney2, Michael Pitz2, A. Sixtus2
1Philips Research Laboratories, Weisshausstrasse 2, D-52066 Aachen, Germany
2Lehrstuhl für Informatik VI, Aachen University of Technology, D-52056 Aachen, Germany

Tài liệu tham khảo

Alleva, 1996, Improvements on the pronunciation prefix tree search organization, 133 Aubert, 1999, One pass crossword decoding for large vocabularies based on a lexical tree search organization, 1559 Aubert, 1995, Large vocabulary continuous speech recognition using word graphs, 49 Aubert, 1994, Large vocabulary continuous speech recognition of Wall Street Journal corpus, 129 Beyerlein, 1997, Discriminative model combination, 238 Beyerlein, 1997, Modelling and decoding of crossword context dependent phones in the Philips large vocabulary continuous speech recognition system, 1163 Beyerlein, 1998, Automatic transcription of English broadcast news Beyerlein, 1999, The Philips/RWTH system for transcription of Broadcast News, 647 Chen, 1998, Speaker, environment and channel change detection and clustering via the Bayesian information criterion Darroch, 1972, Generalized iterative scaling for log linear models, Annals Math. Stat., 43, 1470, 10.1214/aoms/1177692379 Davis, 1980, Comparison of parametric representations for Monosyllabic word recognition in continuously spoken sentences, IEEE T-ASSP, ASSP-28, 357, 10.1109/TASSP.1980.1163420 Fiscus, 1997, A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER), 347 Haeb-Umbach, 1999, An investigation of cepstral parameterisations for large vocabulary speech recognition, 1323 Hain, 1998, Segment generatation and clustering in the HTK Broadcast News transcription system Harris, 1999, 1027 Hermansky, 1990, Perceptual linear predictive (PLP) analysis of speech, J. Acoust. Soc. Am., 87, 1738, 10.1121/1.399423 Jin, 1997, Automatic speaker clustering Klakow, 1998, Language-model optimization by mapping of corpora, 701 Klakow, 1998, Log-linear interpolation of language models, 1695 Kneser, 1996, Statistical language modeling using a variable context length, 494 Kubala, 1998, The 1997 BBN BYBLOS system applied to Broadcast News transcription Lee, 1996, Speaker normalization using efficient frequency warping procedures, Vol. 1, 353 Ney, 1995, On the probabilistic interpretation of neural network classifiers and discriminative training criteria, IEEE Trans. Pattern Anal. Machine Intell., 17, 107, 10.1109/34.368176 Ney, 1992, Improvements in beam search for 10,000-word continuous speech recognition, 13 Odell, J.J., 1995. The use of context in large vocabulary speech recognition. Ph.D. Thesis, University of Cambridge, England Odell, 1994, A one pass decoder design for large vocabulary recognition, 405 Openshaw, 1994, On the limitations of Cepstral features in noise, II49 Ortmanns, 1996, Language-model look-ahead for large vocabulary speech recognition, 2095 Ortmanns, 1996, A comparison of time conditioned and word conditioned search techniques for large vocabulary speech recognition, 2091 Ortmanns, 1997, A word graph algorithm for large vocabulary continuous speech recognition, Comput. Speech Language, 11, 43, 10.1006/csla.1996.0022 Peters, 2000, Capturing long-range correlations using log-linear language models, 79 Pitz, 1999, Automatic verification of Broadcast News transcriptions, 675 Rosenfeld, R., 1994. Adaptive statistical language modeling: a maximum entropy approach. Ph.D. Thesis, CMU Schwartz, 1997, Modeling those F-conditions – or not Siegler, 1997, Automatic segmentation and clustering of Broadcast News audio Steinbiss, 1994, Improvements in beam search, 2143 Thelen, 1997, Speaker adaptation in the philips system for large vocabulary continuous speech recognition, 1035 Welling, 1998, A study on speaker normalization using vocal tract normalization and speaker adaptive training, Vol. 2, 797 Woodland, 1997, Broadcast News transcription using HTK, 719