Enhanced MAP adaptation of n-gram language models using indirect correlation of distant words

T. Moriya1, K. Hirose1, N. Minematsu2, Hui Jiang2
1Graduate School of Frontier Sciences, University of Tokyo, Bunkyo, Tokyo, Japan
2Graduate School of Information Science and Technology, University of Tokyo, Bunkyo, Tokyo, Japan

Tóm tắt

A novel and effective method to adapt n-gram language models to a new domain has been developed. We propose a heuristic method of language model adaptation using indirect correlation between words which are distant from each other, in addition to the conventional n-gram correlation, which represents only superficial and direct information of adjacent words. By adding the correlation of distant words, the adapted models come to include more information on the co-occurrence of words of a target domain and improve their performance for perplexity reduction. Furthermore, since the new correlation covers the indirect one not appearing in surface sentences, the adapted models still work well in domains somewhat different from the target domain. Experiments show that, in comparison with well-known MAP-based adaptation, the proposed method improves the performance of perplexity reduction by approximately 10% in the target domain and also in another domain.

Từ khóa

#Speech recognition #Information science #Multimedia communication #Multimedia systems #Electronic mail #Adaptation model #Vocabulary #Natural languages #Probability #Parameter estimation

Tài liệu tham khảo

ito, 2000, Evaluation of Task Adaptation Using N-gram Count Mixture, THE TRANSACTIONS OF THE IEICE, j83 d, 2418 10.1109/ICASSP.1997.596042 0 sasaki, 2000, Rapid Adaptation of N-gram Language Models Using Inter-word Correlation for Speech Recognition, Proc ICSLP-2000, 4, 508 10.1109/ICSLP.1996.607087