Parameter reduction schemes for loosely coupled HMMs

Computer Speech & Language - Tập 17 - Trang 233-262 - 2003
H.J. Nock1, M. Ostendorf1
1Electrical Engineering Department, University of Washington, Seattle, WA 98195, USA

Tài liệu tham khảo

Baum, 1970, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains, Annals of Mathematical Statistics, 41, 164, 10.1214/aoms/1177697196 Bourlard, H., Dupont, S., Ris, C., 1996. Multi-stream speech recognition. Technical Report IDIAP-RR 96-07, IDIAP Brand, M., Oliver, N., Pentland, A., 1997. Coupled hidden Markov models for complex action recognition. In: Proceedings of IEEE CVPR, pp. 994–999 Byrne, W., Finke, M., Khudanpur, S., McDonough, J., Nock, H., Riley, M., Saraclar, M., Wooters, C., Zavaliagkos, G., 1998. Pronunciation modelling using a hand-labelled corpus for conversational speech recognition. In: Proceedings of ICASSP, pp. 313–316 Cole, R., Muthusamy, Y., Fanty, M., 1990. The ISOLET spoken letter database. Technical Report CSE 90-004, OGI Daoudi, K., Fohr, D., Antoine, C., 2000. A new approach for multi-band speech recognition based on probabilistic graphical models. In: Proceedings of ICSLP, pp. I:329–332 Dempster, 1977, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, 39, 1 Deng, 1992, Structural design of a hidden Markov model based speech recognizer using multi-valued phonetic features: comparison with segmental speech units, Journal of the Acoustical Society of America, 92, 3058, 10.1121/1.404202 Finke, M., Fritsch, J., Koll, D., Waibel, A., 1999. Modeling and efficient decoding of large vocabulary conversational speech. In: Proceedings of Eurospeech, pp. 467–470 Finke, M., Waibel, A., 1997. Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition. In: Proceedings of Eurospeech, pp. 2379–2382 Gales, M.J.F., 1995. Model-based techniques for noise robust speech recognition. PhD Thesis, Cambridge University Engineering Dept., Cambridge, UK Ghahramani, 1997, Factorial hidden Markov models, Machine Learning, 29, 245, 10.1023/A:1007425814087 Gillick, L., Cox, S.J., 1989. Some statistical issues in the comparison of speech recognition algorithms. In: Proceedings of ICASSP, pp. 532–535 Hain, T., Woodland, P.C., 1999. Dynamic HMM selection for continuous speech recognition. In: Proceedings of Eurospeech, pp. 532–535 Hain, T., Woodland, P.C., 2000. Modelling sub-phone insertions and deletions in continuous speech recognition. In: Proceedings of ICSLP, pp. IV:172–175 Hain, T., Woodland, P.C., Evermann, G., Povey, D., 2000. The CU-HTK march 2000 Hub5e transcription system. In: Proceedings of Speech Transcription Workshop Hermansky, H., Tibrewala, S., Pavel, M., 1996. Towards ASR on partially corrupted speech. In: Proceedings of ICSLP, pp. 462–465 Huckvale, M.A., 1994. Word recognition from tiered phonological models. In: Proceedings of Institute of Acoustics Conference on Speech and Hearing, vol. 16, pp. 163–170 Humphries, J.J., Woodland, P.C., 1997. Using accent-specific pronunciation modelling for improved large vocabulary continuous speech recognition. In: Proceedings of Eurospeech, pp. 2367–2370 Kapadia, S., 1998. Discriminative training of hidden Markov models. PhD Thesis, Cambridge University Engineering Dept., Cambridge, UK Keating, P., 1997. Word-level phonetic variation in large speech corpora. In: Pompino-Marschal, B. (Ed.), ZAS Working Papers in Linguistics. Available from <http://www.humnet.ucla.edu/humnet/linguistics/people/keating/berlin1.pdf> King, S., Stephenson, T., Isard, S., Taylor, P., Strachan, A., 1998. Speech recognition via phonetically featured syllables. In: Proceedings of ICSLP, vol. 3, pp. 1031–1034 King, 2000, Detection of phonological features in continuous speech using neural networks, Computer Speech and Language, 14, 333, 10.1006/csla.2000.0148 Kingsbury, P., Strassel, S., McLemore, C., 1997. COMLEX pronouncing lexicon (renamed in 1997 release as CALLHOME American English lexicon). Available from Linguistic Data Consortium <http://www.ldc.upenn.edu> Kirchhoff, K., 1999. Robust speech recognition using articulatory information. PhD Thesis, University of Bielefeld, Germany Lamel L., Adda, G., 1996. On designing pronunciation lexicons for large vocabulary, continuous speech recognition. In: Proceedings of ICSLP, pp. 6–9 Lauritzen, 1996 Leonard, R.G., 1984. A database for speaker-independent digit recognition. In: Proceedings of ICASSP, pp. 42.11–14 Linde, 1980, An algorithm for vector quantizer design, IEEE Transactions Communications, 28, 84, 10.1109/TCOM.1980.1094577 Logan, B., Moreno, P.J., 1997. Factorial hidden Markov models for speech recognition: preliminary experiments. Technical Report 97/7, Cambridge Research Laboratory Mak, B., Tam, Y.-C., 2000. Asynchrony with trained transition probabilities improves performance in multi-band speech recognition. In: Proceedings of ICSLP, pp. IV:149–152 McMahon, P., McCourt, P., Vaseghi, S., 1998. Discriminative weighting of multi-resolution sub-band cepstral features for speech recognition. In: Proceedings of ICSLP, pp. 1055–1058 Mirghafori, N., 1999. A Multi-band approach to automatic speech recognition. PhD Thesis, ICSI, UC Berkeley, CA, USA Neti, C., Potamianos, G., Luettin, J., Matthews, I., Herve Glotin, Vergyri, D., Sison, J., Mashari, J., Zhou, J., 2000. Audio–visual speech recognition. Technical Report, The Johns Hopkins University (Center for Language and Speech Processing) Summer Research Workshop NIST. Score package. Available from <http://www.nist.gov/speech/tools/index.htm> Nock, H.J., 2001. Techniques for modelling phonological processes in automatic speech recognition. PhD Thesis, Cambridge University Engineering Dept., Cambridge, UK, August Nock, H.J., Young, S.J., 2000. Loosely-coupled HMMs for ASR. In: Proceedings of ICSLP, pp. III:143–146 Nock, 2002, Modelling asynchrony in automatic speech recognition using loosely coupled hidden Markov models, Cognitive Science, 26, 283, 10.1207/s15516709cog2603_5 Odell, J., 1995. The use of context in large vocabulary speech recognition. PhD Thesis, Cambridge University Engineering Dept., Cambridge, UK Ostendorf, M., 2000. Incorporating linguistic theories of pronunciation variation into speech recognition models. In: Philosophical Transactions of Royal Society, vol. 358, London, UK, pp. 1325–1338 Ostendorf, 1997, HMM topology design using maximum likelihood successive state splitting, Computer Speech and Language, 11, 17, 10.1006/csla.1996.0021 Rabiner, 1993 Rabiner, 1989, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of IEEE, 77, 257, 10.1109/5.18626 Richmond, K., 2001. Mixture density networks, human articulatory data and acoustic-to-articulatory inversion of continuous speech. In: Proceedings of Institute of Acoustics WISP 2001, Stratford-upon-Avon, UK Riley, M.D., 1991. A statistical model for generating pronunciation networks. In: Proceedings of ICASSP, pp. 737–740 Saraclar, M., 2000. Pronunciation modeling for conversational speech recognition. PhD Thesis, The Johns Hopkins University, MD, USA Saraclar, 2000, Pronunciation modeling by sharing Gaussian densities across phonetic models, Computer Speech and Language, 14, 137, 10.1006/csla.2000.0140 Saul, 1999, Mixed memory Markov models, Machine Learning, 37, 75, 10.1023/A:1007649326333 Tomlinson, M.J., Russell, M.J., Brooke, N.M., 1996. Integrating audio and visual information to provide highly robust speech recognition. In: Proceedings of ICASSP, vol. II, pp. 821–824 Tomlinson, M.J., Russell, M.J., Moore, R.K., Buckland, A.P., Fawley, M.A., 1997. Modelling asynchrony in speech using elementary single-signal decomposition. In: Proceedings of ICASSP, pp. 1247–1250 Varga, A.P., Moore, R.K., 1990. Hidden Markov model decomposition of speech and noise. In: Proceedings of ICASSP, pp. 845–848 Vaxelaire, B., Sock, R., Perrier, P., 2000. Gestural overlap, place of articulation and speech rate: an X-ray investigation. In: Proceedings of ICSLP, pp. II:166–169 Weintraub, M., Stolcke, A., Sankar, A., 1995. SRI Switchboard progress and experiments. In: Proceedings of DARPA LVCSR Workshop Weintraub, M., Wegmann, S., Kao, Y.-H., Khudanpur, S., Galles, C., Fosler, E., Saraclar, M., 1996. Automatic learning of word pronunciation from data. Technical Report, The Johns Hopkins University (Center for Language and Speech Processing) Summer Research Workshop Wilkinson, N., Russell, M.J., 2001. Progress towards improved speech modelling using asynchronous sub-bands and formant frequencies. In: Proceedings of Institute of Acoustics WISP, Stratford-upon-Avon, UK Young, S., Jansen, J., Odell, J., Ollason, D., Woodland, P., 1995. The HTK Book (Version 2.0). ECRL