Parameter reduction schemes for loosely coupled HMMs
Tài liệu tham khảo
Baum, 1970, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains, Annals of Mathematical Statistics, 41, 164, 10.1214/aoms/1177697196
Bourlard, H., Dupont, S., Ris, C., 1996. Multi-stream speech recognition. Technical Report IDIAP-RR 96-07, IDIAP
Brand, M., Oliver, N., Pentland, A., 1997. Coupled hidden Markov models for complex action recognition. In: Proceedings of IEEE CVPR, pp. 994–999
Byrne, W., Finke, M., Khudanpur, S., McDonough, J., Nock, H., Riley, M., Saraclar, M., Wooters, C., Zavaliagkos, G., 1998. Pronunciation modelling using a hand-labelled corpus for conversational speech recognition. In: Proceedings of ICASSP, pp. 313–316
Cole, R., Muthusamy, Y., Fanty, M., 1990. The ISOLET spoken letter database. Technical Report CSE 90-004, OGI
Daoudi, K., Fohr, D., Antoine, C., 2000. A new approach for multi-band speech recognition based on probabilistic graphical models. In: Proceedings of ICSLP, pp. I:329–332
Dempster, 1977, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, 39, 1
Deng, 1992, Structural design of a hidden Markov model based speech recognizer using multi-valued phonetic features: comparison with segmental speech units, Journal of the Acoustical Society of America, 92, 3058, 10.1121/1.404202
Finke, M., Fritsch, J., Koll, D., Waibel, A., 1999. Modeling and efficient decoding of large vocabulary conversational speech. In: Proceedings of Eurospeech, pp. 467–470
Finke, M., Waibel, A., 1997. Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition. In: Proceedings of Eurospeech, pp. 2379–2382
Gales, M.J.F., 1995. Model-based techniques for noise robust speech recognition. PhD Thesis, Cambridge University Engineering Dept., Cambridge, UK
Ghahramani, 1997, Factorial hidden Markov models, Machine Learning, 29, 245, 10.1023/A:1007425814087
Gillick, L., Cox, S.J., 1989. Some statistical issues in the comparison of speech recognition algorithms. In: Proceedings of ICASSP, pp. 532–535
Hain, T., Woodland, P.C., 1999. Dynamic HMM selection for continuous speech recognition. In: Proceedings of Eurospeech, pp. 532–535
Hain, T., Woodland, P.C., 2000. Modelling sub-phone insertions and deletions in continuous speech recognition. In: Proceedings of ICSLP, pp. IV:172–175
Hain, T., Woodland, P.C., Evermann, G., Povey, D., 2000. The CU-HTK march 2000 Hub5e transcription system. In: Proceedings of Speech Transcription Workshop
Hermansky, H., Tibrewala, S., Pavel, M., 1996. Towards ASR on partially corrupted speech. In: Proceedings of ICSLP, pp. 462–465
Huckvale, M.A., 1994. Word recognition from tiered phonological models. In: Proceedings of Institute of Acoustics Conference on Speech and Hearing, vol. 16, pp. 163–170
Humphries, J.J., Woodland, P.C., 1997. Using accent-specific pronunciation modelling for improved large vocabulary continuous speech recognition. In: Proceedings of Eurospeech, pp. 2367–2370
Kapadia, S., 1998. Discriminative training of hidden Markov models. PhD Thesis, Cambridge University Engineering Dept., Cambridge, UK
Keating, P., 1997. Word-level phonetic variation in large speech corpora. In: Pompino-Marschal, B. (Ed.), ZAS Working Papers in Linguistics. Available from <http://www.humnet.ucla.edu/humnet/linguistics/people/keating/berlin1.pdf>
King, S., Stephenson, T., Isard, S., Taylor, P., Strachan, A., 1998. Speech recognition via phonetically featured syllables. In: Proceedings of ICSLP, vol. 3, pp. 1031–1034
King, 2000, Detection of phonological features in continuous speech using neural networks, Computer Speech and Language, 14, 333, 10.1006/csla.2000.0148
Kingsbury, P., Strassel, S., McLemore, C., 1997. COMLEX pronouncing lexicon (renamed in 1997 release as CALLHOME American English lexicon). Available from Linguistic Data Consortium <http://www.ldc.upenn.edu>
Kirchhoff, K., 1999. Robust speech recognition using articulatory information. PhD Thesis, University of Bielefeld, Germany
Lamel L., Adda, G., 1996. On designing pronunciation lexicons for large vocabulary, continuous speech recognition. In: Proceedings of ICSLP, pp. 6–9
Lauritzen, 1996
Leonard, R.G., 1984. A database for speaker-independent digit recognition. In: Proceedings of ICASSP, pp. 42.11–14
Linde, 1980, An algorithm for vector quantizer design, IEEE Transactions Communications, 28, 84, 10.1109/TCOM.1980.1094577
Logan, B., Moreno, P.J., 1997. Factorial hidden Markov models for speech recognition: preliminary experiments. Technical Report 97/7, Cambridge Research Laboratory
Mak, B., Tam, Y.-C., 2000. Asynchrony with trained transition probabilities improves performance in multi-band speech recognition. In: Proceedings of ICSLP, pp. IV:149–152
McMahon, P., McCourt, P., Vaseghi, S., 1998. Discriminative weighting of multi-resolution sub-band cepstral features for speech recognition. In: Proceedings of ICSLP, pp. 1055–1058
Mirghafori, N., 1999. A Multi-band approach to automatic speech recognition. PhD Thesis, ICSI, UC Berkeley, CA, USA
Neti, C., Potamianos, G., Luettin, J., Matthews, I., Herve Glotin, Vergyri, D., Sison, J., Mashari, J., Zhou, J., 2000. Audio–visual speech recognition. Technical Report, The Johns Hopkins University (Center for Language and Speech Processing) Summer Research Workshop
NIST. Score package. Available from <http://www.nist.gov/speech/tools/index.htm>
Nock, H.J., 2001. Techniques for modelling phonological processes in automatic speech recognition. PhD Thesis, Cambridge University Engineering Dept., Cambridge, UK, August
Nock, H.J., Young, S.J., 2000. Loosely-coupled HMMs for ASR. In: Proceedings of ICSLP, pp. III:143–146
Nock, 2002, Modelling asynchrony in automatic speech recognition using loosely coupled hidden Markov models, Cognitive Science, 26, 283, 10.1207/s15516709cog2603_5
Odell, J., 1995. The use of context in large vocabulary speech recognition. PhD Thesis, Cambridge University Engineering Dept., Cambridge, UK
Ostendorf, M., 2000. Incorporating linguistic theories of pronunciation variation into speech recognition models. In: Philosophical Transactions of Royal Society, vol. 358, London, UK, pp. 1325–1338
Ostendorf, 1997, HMM topology design using maximum likelihood successive state splitting, Computer Speech and Language, 11, 17, 10.1006/csla.1996.0021
Rabiner, 1993
Rabiner, 1989, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of IEEE, 77, 257, 10.1109/5.18626
Richmond, K., 2001. Mixture density networks, human articulatory data and acoustic-to-articulatory inversion of continuous speech. In: Proceedings of Institute of Acoustics WISP 2001, Stratford-upon-Avon, UK
Riley, M.D., 1991. A statistical model for generating pronunciation networks. In: Proceedings of ICASSP, pp. 737–740
Saraclar, M., 2000. Pronunciation modeling for conversational speech recognition. PhD Thesis, The Johns Hopkins University, MD, USA
Saraclar, 2000, Pronunciation modeling by sharing Gaussian densities across phonetic models, Computer Speech and Language, 14, 137, 10.1006/csla.2000.0140
Saul, 1999, Mixed memory Markov models, Machine Learning, 37, 75, 10.1023/A:1007649326333
Tomlinson, M.J., Russell, M.J., Brooke, N.M., 1996. Integrating audio and visual information to provide highly robust speech recognition. In: Proceedings of ICASSP, vol. II, pp. 821–824
Tomlinson, M.J., Russell, M.J., Moore, R.K., Buckland, A.P., Fawley, M.A., 1997. Modelling asynchrony in speech using elementary single-signal decomposition. In: Proceedings of ICASSP, pp. 1247–1250
Varga, A.P., Moore, R.K., 1990. Hidden Markov model decomposition of speech and noise. In: Proceedings of ICASSP, pp. 845–848
Vaxelaire, B., Sock, R., Perrier, P., 2000. Gestural overlap, place of articulation and speech rate: an X-ray investigation. In: Proceedings of ICSLP, pp. II:166–169
Weintraub, M., Stolcke, A., Sankar, A., 1995. SRI Switchboard progress and experiments. In: Proceedings of DARPA LVCSR Workshop
Weintraub, M., Wegmann, S., Kao, Y.-H., Khudanpur, S., Galles, C., Fosler, E., Saraclar, M., 1996. Automatic learning of word pronunciation from data. Technical Report, The Johns Hopkins University (Center for Language and Speech Processing) Summer Research Workshop
Wilkinson, N., Russell, M.J., 2001. Progress towards improved speech modelling using asynchronous sub-bands and formant frequencies. In: Proceedings of Institute of Acoustics WISP, Stratford-upon-Avon, UK
Young, S., Jansen, J., Odell, J., Ollason, D., Woodland, P., 1995. The HTK Book (Version 2.0). ECRL