Smoothed language model incorporation for efficient time-synchronous beam search decoding in LVCSR

D. Willett1, E. McDermott1, S. Katagiri1
1Speech Open Lab, NTT Corporation, Kyoto, Japan

Tóm tắt

For performing the decoding search in large vocabulary continuous speech recognition (LVCSR) with hidden Markov models (HMM) and statistical language models, the most straightforward and popular approach is the time-synchronous beam search procedure. A drawback of this approach is that the time-asynchrony of the language model weight application during search leads to performance degradations. This is particularly so when performing the search with a tight pruning beam. This study presents a method for smoothing the language model within the recognition network. The optimization goal is the smearing of transition probabilities from HMM state to HMM state in favor of a more time-synchronous language model weight application. In addition, state-based language model look-ahead is proposed and evaluated. Both language model smoothing techniques lead to a remarkable improvement in accuracy-to-run-time ratio, while their combined application yields only limited improvements.

Từ khóa

#Decoding #Hidden Markov models #Acoustic beams #Degradation #Smoothing methods #Viterbi algorithm #Context modeling #Laboratories #Speech recognition #Natural languages

Tài liệu tham khảo

10.1109/ICASSP.1997.598876 steinbiss, 0, Improvements in Beam Search, ICSLP, 2143 odell, 1996, The Use of Context in Large Vocabulary Speech Recognition neukirchen, 0, Reduced Lexicon Trees for Decoding in a MMI-Connecionist/HMM Speech Recognition System, Eurospeech'97, 2639 young, 1989, Token Passing: a Simple Conceptual Model for Connected Speech Recognition Systems, Tech Rep TR furui, 0, Toward the Realization of Spontaneous Speech Recognition - Introduction of a Japanese Priority Program and Preliminary Results, ICSLP'00, 518 willett, 0, Time and Memory Efficient Viterbi Decoding for LVCSR using a Precompiled Search Network, Eurospeech'01 mohri, 1998, Network Optimizations for Large Vocabulary Speech Recognition, Speech Communication, 25 pereira, 1997, Speech Recognition by Composition of Weighted Finite Automata, Finite-State Language Processing