Prosodic and other cues to speech recognition failures

Speech Communication - Tập 43 - Trang 155-175 - 2004
Julia Hirschberg1, Diane Litman2, Marc Swerts3
1Department of Computer Science, Columbia University, 1241 Amsterdam Avenue, M/C 0401, New York, NY 10027, USA
2Department of Computer Science, University of Pittsburgh, 210 South Bouquet Street, Pittsburgh, PA 15260, USA, and LRDC, University of Pittsburgh, 3939 O'Hara Street, Pittsburgh, PA 15260, USA
3Faculty of Arts, Communication & Cognition, University of Tilburg, P.O. Box 90153, NL-5000 LE Tilburg, The Netherlands, and CNTS, University of Antwerp, Universiteitsplein 1, B-2610 Wilrijk, Belgium

Tài liệu tham khảo

Ammicht, E., Potamianos, A., Fosler-Lussier, E., 2001. Ambiguity representation and resolution in spoken dialogue systems. In: Proc. EUROSPEECH-01, Aalborg, pp. 2217–2220 Andorno, M., Laface, P., Gemello, R., 2002. Experiments in confidence scoring for word and sentence verification. In: Proc. Internat. Conf. on Spoken Language Processing-02, Denver, pp. 1377–1381 Bell, L., Gustafson, J., 1999. Repetition and its phonetic realizations: Investigating a Swedish database of spontaneous computer-directed speech. In: Proc. Internat. Congress of Phonetic Sciences-99, San Francisco, pp. 1221–1224 Blaauw, E., 1992. Phonetic differences between read and spontaneous speech. In: Proc. Internat. Conf. on Spoken Language Processing-92, Banff, Vol. 1, pp. 751–758 Bouwman, A.G., Sturm, J., Boves, L., 1999. Incorporating confidence measures in the Dutch train timetable information system developed in the ARISE project. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing, Phoenix, Vol. 1, pp. 493–496 Bruce, G., 1995. Modelling Swedish intonation for read and spontaneous speech. In: Proc. Internat. Congress of Phonetic Sciences, Stockholm, Vol. 2, pp. 28–35 Cohen, W., 1996. Learning trees and rules with set-valued features. In: 14th Conference of the American Association of Artificial Intelligence, AAAI, Portland, pp. 709–716 Doddington, G., Liggett, W., Martin, A., Przybocki, M., Reynolds, D., 1998. Sheep, goats, lambs and wolves: A statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation. In: Proc. Internat. Conf. on Spoken Language Processing-98, Sydney, pp. 608–611 Falavigna, D., Gretter, R., Riccardi, G., 2002. Acoustic and word lattice based algorithms for confidence scores. In: Proc. Internat. Conf. on Spoken Language Processing-02, Denver, pp. 1621–1624 Fant, G., Liljencrants, J., Karlsson, I., Båvegård, M., 1995. Time and frequency domain aspects of voice source modelling. BR Speechmaps 6975, ESPRIT. Deliverable 27 WP 1.3 Guillevic, D., Gandrabur, S., Normandin, Y., 2002. Robust semantic confidence scoring. In: Proc. Internat. Conf. on Spoken Language Processing-02, Denver, pp. 853–856 Hirose, 1997, Disambiguating recognition results by prosodic features, 327 Hirschberg, J., 1991. Using text analysis to predict international boundaries. In: Proc. Second European Conference on Speech Communication and Technology, Genova, pp. 1275–1278 Hirschberg, J., 1995. Prosodic and other acoustic cues to speaking style in spontaneous and read speech. In: Proc. Internat. Congress of Phonetic Sciences, Stockholm, Vol. 2, pp. 36–43 Hirschberg, J., Litman, D., Swerts, M., 1999. Prosodic cues to recognition errors. In: Proc. Automatic Speech Recognition and Understanding Workshop (ASRU'99), Keystone, pp. 349–352 Hirschberg, J., Litman, D., Swerts, M., 2001. Identifying user corrections automatically in spoken dialogue systems. In: Proc. NAACL-01, Pittsburgh, pp. 208–215 Kamm, C., Narayanan, S., Dutton, D., Ritenour, R., 1997. Evaluating spoken dialog systems for telecommunication services. In: Proc. EUROSPEECH-97, Rhodes, pp. 2203–2206 Kraayeveld, H., 1997. Idiosyncrasy in prosody. Speaker and speaker group identification in Dutch using melodic and temporal information. Ph.D. thesis, Nijmegen University Krahmer, 2001, Error detection in spoken human–machine interaction, International Journal of Speech Technology, 4, 19, 10.1023/A:1009648614566 Levow, G.-A., 1998. Characterizing and recognizing spoken corrections in human–computer dialogue. In: Proc. 36th Annual Meeting of the Association of Computational Linguistics, COLING/ACL 98, Montreal, pp. 736–742 Litman, D., Pan, S., 1999. Empirically evaluating an adaptable spoken dialogue system. In: Proc. 7th International Conference on User Modeling (UM), Banff, pp. 55–64 Litman, D., Walker, M., Kearns, M., 1999. Automatic detection of poor speech recognition at the dialogue level. In: Proc. 37th Annual Meeting of the Association of Computational Linguistics, ACL99, College Park, pp. 309–316 Litman, D., Hirschberg, J., Swerts, M., 2001. Predicting user reactions to system error. In: Proc. ACL-2001, Toulouse, pp. 329–369 Moreno, P.J., Logan, B., Raj, B., 2001. A boosting approach for confidence scoring. In: Proc. EUROSPEECH-01, Aalborg, pp. 2109–2112 Ostendorf, M., Byrne, B., Bacchiani, M., Finke, M., Gunawardana, A., Ross, K., Roweis, S., Shriberg, E., Talkin, D., Waibel, A., Wheatley, B., Zeppenfeld, T., 1997. Modeling systematic variations in pronunciation via a language-dependent hidden speaking mode. Report on 1996 CLSP/JHU Workshop on Innovative Techniques for Large Vocabulary Continuous Speech Recognition Oviatt, S.L., Levow, G., MacEarchern, M., Kuhn, K., 1996. Modeling hyperarticulate speech during human–computer error resolution. In: Proc. Internat. Conf. on Spoken Language Processing-96, Philadelphia, pp. 801–804 Rahim, M., Pieraccini, R., Eckert, W., Levin, E., Di Fabbrizio, G., Riccardi, G., Lin, C., Kamm, C., 1999. W99––a spoken dialogue system for the ASRU'99 workshop. In: Proc. ASRU'99, Keystone Sharp, R.D., Bocchieri, E., Castillo, C., Parthasarathy, S., Rath, C., Riley, M., Rowland, J., 1997. The Watson speech recognition engine. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing-97, Munich, pp. 4065–4068 Soltau, H., Waibel, A., 1998. On the influence of hyperarticulated speech on recognition performance. In: Proc. Internat. Conf. on Spoken Language Processing-98, Sydney, pp. 225–228 Soltau, H., Waibel, A., 2000. Specialized acoustic models for hyperarticulated speech. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing 2000, Istanbul, pp. 1779–1782 Soltau, H., Metze, H., Waibel, A., 2002. Compensating for hyperarticulation by modeling articulatory properties. In: Proc. Internat. Conf. on Spoken Language Processing-02, Denver, pp. 83–86 Swerts, 1997, Prosodic and lexical indications of discourse structure in human–machine interactions, Speech Communication, 22, 25, 10.1016/S0167-6393(97)00011-3 Swerts, M., Veldhuis, R., 1997. Interactions between intonation and glottal-pulse characteristics. In: Botinis, A., Kouroupetroglou, G., Carayiannis, G., (Eds.), Intonation: Theory, Models and Applications, Athens, pp. 297–300 Swerts, M., Litman, D., Hirschberg, J., 2000. Corrections in spoken dialogue systems. In: Proc. Internat. Conf. on Spoken Language Processing-00, Beijing, pp. 615–618 Talkin, D., 1995. A Robust algorithm for pitch tracking (RAPT). In: Klein, W.B., Paliwal, K.K. (Eds.), Speech Coding and Synthesis, Athens, pp. 495–518 Veilleux, N., 1994. Computational models of the prosody/syntax mapping for spoken language Systems. Ph.D. thesis, Boston University Wade, E., Shriberg, E.E., Price, P.J., 1992. User behaviors affecting speech recognition. In: Proc. Internat. Conf. on Spoken Language Processing-92, Banff, Vol. 2, pp. 995–998 Walker, M., Fromer, J., Narayanan, S., 1998. Learning optimal dialogue strategies: A case study of a spoken dialogue agent for email. In: Proc. ACL/COLING, Montreal, pp. 1345–1352 Walker, M., Kamm, C., Litman, D., 2000a. Towards developing general models of usability with PARADISE. Natural Language Engineering: Special Issue on Best Practice in Spoken Language Dialogue System Engineering, Vol. 6, pp. 363–377 Walker, M., Langkilde, I., Wright, J., Gorin, A., Litman, D., 2000b. Learning to predict problematic situations in a spoken dialogue system: Experiments with How may I help you? In: Proc. NAACL-00, Seattle, pp. 210–217 Wang, H.-M., Lin, Y.-C., 2002. Error-tolerant spoken language understanding with confidence measuring. In: Proc. Internat. Conf. on Spoken Language Processing-02, Denver, pp. 1625–1628 Weintraub, M., Taussig, K., Hunicke-Smith, K., Snodgrass, A., 1996. Effect of speaking style on LVCSR performance. In: Proc. Internat. Conf. on Spoken Language Processing-96, Philadelphia, pp. S16–S19 (addendum) Zeljkovic, I., 1996. Decoding optimal state sequences with smooth state likelihoods. In: International Conference on Acoustics, Speech, and Signal Processing 96, Atlanta, pp. 129–132 Zhang, R., Rudnicky, A., 2001. Word level confidence annotation using combinations of features. In: Proc. EUROSPEECH-01, Aalborg, pp. 2105–2108