Tone-enhanced generalized character posterior probability (GCPP) for Cantonese LVCSR

Computer Speech & Language - Tập 22 - Trang 360-373 - 2008
Yao Qian1,2, Frank K. Soong1,2, Tan Lee2
1Microsoft Research Asia, 5th Floor Beijing Sigma Center, No.49, Zhichun Road, Haidian District, Beijing 100080, PR China
2Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, PR China

Tài liệu tham khảo

Bauer, 1997, Modern Cantonese phonology, vol. 102 Cao, Y., Deng, Y., Zhang, H., Huang, T., Xu, B., 2000. Decision-tree based Mandarin tone model and its application to speech recognition. In: Proceedings of the ICASSP. Chao, 1930, system of tone letters, Le Maitre Phonetique, 45, 24 Chen, 1995, Tone recognition of continuous Mandarin speech based on neural networks, IEEE Trans. Speech Audio Process., 3, 146, 10.1109/89.366544 Chen, C.J., Gopinath, R.A., Monkowski, M.D., Picheny, M.A., Shen, K., 1997. New methods in continuous Mandarin speech recognition. In: Proceedings of the Eurospeech. Choi, W.N., Wong, Y.W., Lee, Tan., Ching, P.C., 2000. Lexical tree decoding with a class-based language model for Chinese speech recognition. In: Proceedings of the ICSLP. CUCorpora: Cantonese Spoken Language Resources, 2001. <http://dsp.ee.cuhk.edu.hk/speech/>. Evermann, G., Woodland, P.C., 2000. Posterior probability decoding: confidence estimation and system combination. In: Proceedings of the Speech Transcription Workshop. Fetter, P., Dandurand, F., Brietzmann, P.R., 1996. Word graph rescoring using confidence measures. In: Proceedings of the ICSLP. Goel, 2000, Minimum Bayes-risk automatic speech recognition, Comp. Speech Lang., 14, 115, 10.1006/csla.2000.0138 Hashimoto, 1972 Hirose, K., Zhang, J.S., 1999. Tone recognition of Chinese continuous speech using tone critical segments. In: Proceedings of the Eurospeech. Hong, 1997 Huang, H., Seide, F., 2000. Pitch tracking and tone features for Mandarin speech recognition. In: Proceedings of the ICASSP. Huang, 2001 Lee, 1995, Tone recognition of isolated Cantonese syllables, IEEE Trans. Speech Audio Process., 3, 204, 10.1109/89.388147 Lee, 2002, Using tone information in Cantonese continuous speech recognition, ACM Trans. Asian Lang. Inform. Process., 1, 83, 10.1145/595576.595581 Lin, 1996, Frameworks for recognition of Mandarin syllables with tones using sub-syllabic units, J. Speech Commun., 18, 175, 10.1016/0167-6393(95)00043-7 Lo, W.K., 2000. Cantonese phonology and phonetics: an engineering introduction. Internal Documentation. DSP & Speech Technology Laboratory of the Chinese University of Hong Kong. Lo, W.K., Soong, F.K., Nakamura, S., 2004. Robust verification of recognized words in noise. In: Proceedings of the ICSLP. Mangu, 2000, Finding consensus in speech recognition: word error minimization and other applications of confusion networks, Comp. Speech Lang., 14, 373, 10.1006/csla.2000.0152 Nelder, 1965, A simplex method for function minimization, Comp. J., 7, 308, 10.1093/comjnl/7.4.308 Neti, C., Roukos, S., Eide, E., 1997. Word-based confidence measures as a guide for stack search in speech recognition. In: Proceedings of ICASSP. Ortmanns, 1997, A word graph algorithm for large vocabulary continuous speech recognition, Comp. Speech Lang., 11, 43, 10.1006/csla.1996.0022 Peng, 2005, Tone recognition of continuous Cantonese speech based on support vector machines, J. Speech Commun., 45, 49, 10.1016/j.specom.2004.09.004 Qian, Y., 2005. Use of tone information in Cantonese LVCSR based on generalized character posterior probability decoding. Ph.D. Dissertation, The Chinese University of Hong Kong. Qian, Y., Lee, Tan., Li, Y.J., 2003. Overlapped di-tone modeling for tone recognition in continuous Cantonese speech. In: Proceedings of the Eurospeech. Qian, Y., Lee, Tan., Soong, F.K., 2004. Tone information as a confidence measure for improving Cantonese LVCSR. In: Proceedings of the ICSLP. Qian, Y., Soong, F.K., Lee, T., 2006. Tone-enhanced generalized character posterior probability (GCPP) for Cantonese LVCSR. In: Proceedings of the ICASSP. Seide, F., Wang, N.J.C., 2000. Two-stream modeling of Mandarin tones. In: Proceedings of the ICSLP. Soong, F.K., Lo, W.K., Nakamura, S., 2004. Generalized word posterior probability (GWPP) for measuring reliability of recognized words. In: Proceedings of the SWIM. Soong, F.K., Lo, W.K., Nakamura, S., 2004. Optimal acoustic and language model weights for minimizing word verification errors. In: Proceedings of the ICSLP. Stolcke, A., Konig, Y., Weintraub, M., 1997. Explicit word error minimization in N-best list rescoring. In: Proceedings of the Eurospeech. Talkin, 1995, A robust algorithm for pitch tracking (RAPT), 495 Tian, Y., et al., 2004. Tone recognition with fractionized models and outlined features. In: Proceedings of the ICASSP. Wang, H.L., Qian, Y., Soong, F.K., Zhou, J.-L., Han, J.Q., 2006. A multi-space distribution (MSD) approach to speech recognition of tonal languages. In: Proceedings of the ICSLP. Weintraub, M., 1995. LVCSR log-likelihood ratio scoring for key-word spotting. In: Proceedings of the ICSLP. Wessel, F., Schluter, R., Ney, H., 2000. Using posterior probabilities for improved speech recognition. In: Proceedings of the ICASSP. Wessel, 2001, Confidence measures for large vocabulary continuous speech recognition, IEEE Trans. Speech Audio Process., 9, 288, 10.1109/89.906002 Wong, Y.W., Chang, E., 2001. The effect of pitch and tone on different Mandarin speech recognition tasks. In: Proceedings of the Eurospeech. Wong, Y.W., Chow, K.F., Lau, W., Lo, W.K., Lee, T., Ching, P.C., 1999. Acoustic modeling and language modeling for Cantonese LVCSR. In: Proceedings of the Eurospeech. Xu, 1997, Contextual tonal variation on Mandarin, J. Phonetics, 25, 61, 10.1006/jpho.1996.0034 Zhang, J.S., Hirose, K., 2000. Anchoring hypothesis and its application to tone recognition of Chinese continuous speech. In: Proceedings of the ICASSP.