Prosodic and other cues to speech recognition failures
Tài liệu tham khảo
Ammicht, E., Potamianos, A., Fosler-Lussier, E., 2001. Ambiguity representation and resolution in spoken dialogue systems. In: Proc. EUROSPEECH-01, Aalborg, pp. 2217–2220
Andorno, M., Laface, P., Gemello, R., 2002. Experiments in confidence scoring for word and sentence verification. In: Proc. Internat. Conf. on Spoken Language Processing-02, Denver, pp. 1377–1381
Bell, L., Gustafson, J., 1999. Repetition and its phonetic realizations: Investigating a Swedish database of spontaneous computer-directed speech. In: Proc. Internat. Congress of Phonetic Sciences-99, San Francisco, pp. 1221–1224
Blaauw, E., 1992. Phonetic differences between read and spontaneous speech. In: Proc. Internat. Conf. on Spoken Language Processing-92, Banff, Vol. 1, pp. 751–758
Bouwman, A.G., Sturm, J., Boves, L., 1999. Incorporating confidence measures in the Dutch train timetable information system developed in the ARISE project. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing, Phoenix, Vol. 1, pp. 493–496
Bruce, G., 1995. Modelling Swedish intonation for read and spontaneous speech. In: Proc. Internat. Congress of Phonetic Sciences, Stockholm, Vol. 2, pp. 28–35
Cohen, W., 1996. Learning trees and rules with set-valued features. In: 14th Conference of the American Association of Artificial Intelligence, AAAI, Portland, pp. 709–716
Doddington, G., Liggett, W., Martin, A., Przybocki, M., Reynolds, D., 1998. Sheep, goats, lambs and wolves: A statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation. In: Proc. Internat. Conf. on Spoken Language Processing-98, Sydney, pp. 608–611
Falavigna, D., Gretter, R., Riccardi, G., 2002. Acoustic and word lattice based algorithms for confidence scores. In: Proc. Internat. Conf. on Spoken Language Processing-02, Denver, pp. 1621–1624
Fant, G., Liljencrants, J., Karlsson, I., Båvegård, M., 1995. Time and frequency domain aspects of voice source modelling. BR Speechmaps 6975, ESPRIT. Deliverable 27 WP 1.3
Guillevic, D., Gandrabur, S., Normandin, Y., 2002. Robust semantic confidence scoring. In: Proc. Internat. Conf. on Spoken Language Processing-02, Denver, pp. 853–856
Hirose, 1997, Disambiguating recognition results by prosodic features, 327
Hirschberg, J., 1991. Using text analysis to predict international boundaries. In: Proc. Second European Conference on Speech Communication and Technology, Genova, pp. 1275–1278
Hirschberg, J., 1995. Prosodic and other acoustic cues to speaking style in spontaneous and read speech. In: Proc. Internat. Congress of Phonetic Sciences, Stockholm, Vol. 2, pp. 36–43
Hirschberg, J., Litman, D., Swerts, M., 1999. Prosodic cues to recognition errors. In: Proc. Automatic Speech Recognition and Understanding Workshop (ASRU'99), Keystone, pp. 349–352
Hirschberg, J., Litman, D., Swerts, M., 2001. Identifying user corrections automatically in spoken dialogue systems. In: Proc. NAACL-01, Pittsburgh, pp. 208–215
Kamm, C., Narayanan, S., Dutton, D., Ritenour, R., 1997. Evaluating spoken dialog systems for telecommunication services. In: Proc. EUROSPEECH-97, Rhodes, pp. 2203–2206
Kraayeveld, H., 1997. Idiosyncrasy in prosody. Speaker and speaker group identification in Dutch using melodic and temporal information. Ph.D. thesis, Nijmegen University
Krahmer, 2001, Error detection in spoken human–machine interaction, International Journal of Speech Technology, 4, 19, 10.1023/A:1009648614566
Levow, G.-A., 1998. Characterizing and recognizing spoken corrections in human–computer dialogue. In: Proc. 36th Annual Meeting of the Association of Computational Linguistics, COLING/ACL 98, Montreal, pp. 736–742
Litman, D., Pan, S., 1999. Empirically evaluating an adaptable spoken dialogue system. In: Proc. 7th International Conference on User Modeling (UM), Banff, pp. 55–64
Litman, D., Walker, M., Kearns, M., 1999. Automatic detection of poor speech recognition at the dialogue level. In: Proc. 37th Annual Meeting of the Association of Computational Linguistics, ACL99, College Park, pp. 309–316
Litman, D., Hirschberg, J., Swerts, M., 2001. Predicting user reactions to system error. In: Proc. ACL-2001, Toulouse, pp. 329–369
Moreno, P.J., Logan, B., Raj, B., 2001. A boosting approach for confidence scoring. In: Proc. EUROSPEECH-01, Aalborg, pp. 2109–2112
Ostendorf, M., Byrne, B., Bacchiani, M., Finke, M., Gunawardana, A., Ross, K., Roweis, S., Shriberg, E., Talkin, D., Waibel, A., Wheatley, B., Zeppenfeld, T., 1997. Modeling systematic variations in pronunciation via a language-dependent hidden speaking mode. Report on 1996 CLSP/JHU Workshop on Innovative Techniques for Large Vocabulary Continuous Speech Recognition
Oviatt, S.L., Levow, G., MacEarchern, M., Kuhn, K., 1996. Modeling hyperarticulate speech during human–computer error resolution. In: Proc. Internat. Conf. on Spoken Language Processing-96, Philadelphia, pp. 801–804
Rahim, M., Pieraccini, R., Eckert, W., Levin, E., Di Fabbrizio, G., Riccardi, G., Lin, C., Kamm, C., 1999. W99––a spoken dialogue system for the ASRU'99 workshop. In: Proc. ASRU'99, Keystone
Sharp, R.D., Bocchieri, E., Castillo, C., Parthasarathy, S., Rath, C., Riley, M., Rowland, J., 1997. The Watson speech recognition engine. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing-97, Munich, pp. 4065–4068
Soltau, H., Waibel, A., 1998. On the influence of hyperarticulated speech on recognition performance. In: Proc. Internat. Conf. on Spoken Language Processing-98, Sydney, pp. 225–228
Soltau, H., Waibel, A., 2000. Specialized acoustic models for hyperarticulated speech. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing 2000, Istanbul, pp. 1779–1782
Soltau, H., Metze, H., Waibel, A., 2002. Compensating for hyperarticulation by modeling articulatory properties. In: Proc. Internat. Conf. on Spoken Language Processing-02, Denver, pp. 83–86
Swerts, 1997, Prosodic and lexical indications of discourse structure in human–machine interactions, Speech Communication, 22, 25, 10.1016/S0167-6393(97)00011-3
Swerts, M., Veldhuis, R., 1997. Interactions between intonation and glottal-pulse characteristics. In: Botinis, A., Kouroupetroglou, G., Carayiannis, G., (Eds.), Intonation: Theory, Models and Applications, Athens, pp. 297–300
Swerts, M., Litman, D., Hirschberg, J., 2000. Corrections in spoken dialogue systems. In: Proc. Internat. Conf. on Spoken Language Processing-00, Beijing, pp. 615–618
Talkin, D., 1995. A Robust algorithm for pitch tracking (RAPT). In: Klein, W.B., Paliwal, K.K. (Eds.), Speech Coding and Synthesis, Athens, pp. 495–518
Veilleux, N., 1994. Computational models of the prosody/syntax mapping for spoken language Systems. Ph.D. thesis, Boston University
Wade, E., Shriberg, E.E., Price, P.J., 1992. User behaviors affecting speech recognition. In: Proc. Internat. Conf. on Spoken Language Processing-92, Banff, Vol. 2, pp. 995–998
Walker, M., Fromer, J., Narayanan, S., 1998. Learning optimal dialogue strategies: A case study of a spoken dialogue agent for email. In: Proc. ACL/COLING, Montreal, pp. 1345–1352
Walker, M., Kamm, C., Litman, D., 2000a. Towards developing general models of usability with PARADISE. Natural Language Engineering: Special Issue on Best Practice in Spoken Language Dialogue System Engineering, Vol. 6, pp. 363–377
Walker, M., Langkilde, I., Wright, J., Gorin, A., Litman, D., 2000b. Learning to predict problematic situations in a spoken dialogue system: Experiments with How may I help you? In: Proc. NAACL-00, Seattle, pp. 210–217
Wang, H.-M., Lin, Y.-C., 2002. Error-tolerant spoken language understanding with confidence measuring. In: Proc. Internat. Conf. on Spoken Language Processing-02, Denver, pp. 1625–1628
Weintraub, M., Taussig, K., Hunicke-Smith, K., Snodgrass, A., 1996. Effect of speaking style on LVCSR performance. In: Proc. Internat. Conf. on Spoken Language Processing-96, Philadelphia, pp. S16–S19 (addendum)
Zeljkovic, I., 1996. Decoding optimal state sequences with smooth state likelihoods. In: International Conference on Acoustics, Speech, and Signal Processing 96, Atlanta, pp. 129–132
Zhang, R., Rudnicky, A., 2001. Word level confidence annotation using combinations of features. In: Proc. EUROSPEECH-01, Aalborg, pp. 2105–2108