Evaluation and usability of multimodal spoken language dialogue systems

Speech Communication - Tập 43 - Trang 33-54 - 2004
Laila Dybkjær1, Niels Ole Bernsen1, Wolfgang Minker2
1Natural Interactive Systems Laboratory, Science Park 10, 5230 Odense M, Denmark
2Department of Information Technology, University of Ulm, Albert-Einstein-Allee 43, 89081 Ulm, Germany

Tài liệu tham khảo

Allen, 1995, The TRAINS project: A case study in building a conversational planning agent, Journal of Experimental and Theoretical AI (JETAI), 7, 7, 10.1080/09528139508953799 Almeida, L., Amdal, I., Beires, N., Boualem, M., Boves, L., den Os, L., Filoche, P., Gomes, R., Knudsen, J., Kvale, K., Rugelbak, J., Tallec, C., Warakagoda, N., 2002. Implementing and evaluating a multimodal tourist guide. In: Proceedings of the International CLASS Workshop on Natural, Intelligent and Effective Interaction in Multimodal Dialogue Systems, Copenhagen. pp. 1–7 Baekgaard, A., Bernsen, N.O., Brøndsted, T., Dalsgaard, P., Dybkjær, H., Dybkjær, L., Kristiansen, J., Larsen, L., Lindberg, B., Maegaard, B., Music, B., Offersgaard, L., Povlsen, C., 1995. The Danish spoken dialogue project––a general overview. In: Proceedings of the ESCA Workshop on Spoken Dialogue Systems, Vigsø, Denmark. pp. 89–92 Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E., 2000. Desperately seeking emotions: Actors, wizards, and human beings. In: Proceedings of the ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research, Belfast. pp. 195–200 Beringer, N., Kartal, U., Louka, K., Schiel, F., Türk, U., 2002. PROMISE––a procedure for multimodal interactive system evaluation. In: Proceedings of the LREC Workshop on Multimodal Resources and Multimodal Systems Evaluation, Las Palmas. pp. 77–80 Bernsen, 1994, Foundations of multimodal representations, A taxonomy of representational modalities. Interacting with Computers, 6, 347, 10.1016/0953-5438(94)90008-6 Bernsen, 1997, Towards a tool for predicting speech functionality, Speech Communication, 23, 181, 10.1016/S0167-6393(97)00046-0 Bernsen, 2002, Multimodality in language and speech systems––from theory to design support tool, 93 Bernsen, 2003, User modelling in the car, 378 Bernsen, 1998 Bernsen, 1999, A theory of speech in multimodal systems, 105 Bernsen, N.O., Dybkjær, L., 2000. A methodology for evaluating spoken language dialogue systems and their components. In: Proceedings of the Second International Conference on Language Resources and Evaluation (LREC, 2000), Athens. pp. 183–188 Bickmore, T., Cassell, J., 2002. Phone vs. face-to-face with virtual persons. In: Proceedings of the International CLASS Workshop on Natural, Intelligent and Effective Interaction in Multimodal Dialogue Systems, Copenhagen. pp. 15–22 Boros, M., Eckert, W., Gallwitz, F., Görz, G., Hanrieder, G., Niemann, H., 1996. Towards understanding spontaneous speech: Word accuracy vs. concept accuracy. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP 1996), Philadelphia, Vol. 2. pp. 1009–1012 Bossemeyer, 1991, Automated alternate billing services at Ameritech: Speech recognition and the human interface, Speech Technology Magazine, 5, 24 Brusilovsky, P., Corbett, A., de Rosis, F. (Eds.), 2003. User Modeling 2003 Proceedings of the 9th International Conference, UM 2003, Johnstown, PA, USA, June 2003, Springer Lecture Notes in Artificial Intelligence, Vol. 2702 Buisine, S., Martin, J.C., 2003. Experimental evaluation of bi-directional multimodal interaction with conversational agents. In: Proceedings of the Ninth IFIP TC13 International Conference on Human–Computer Interaction (INTERACT 2003), Zürich, Switzerland Bühler, D., Minker, W., Häussler, J., Krüger, S., 2002. Flexible multimodal human–machine interaction in mobile environments. In: Proceedings of the ECAI Workshop on Artificial Intelligence in Mobile System (AIMS), Lyon. pp. 66–70 2000 Cohen, 2003, Facial expression recognition from video sequences: Temporal and static modeling, Computer Vision and Image Understanding, Special Issue on Face Recognition, 91, 160, 10.1016/S1077-3142(03)00081-X Cohen, P., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., J. Clow, J., 1997. QuickSet: Multimodal interaction for distributed applications. In: Fifth ACM International Multimedia Conference. pp. 31–40 Dybkjær, 2000, Usability issues in spoken language dialogue systems, Natural Language Engineering, Special Issue on Best Practice in Spoken Language Dialogue System Engineering, 6, 243 Dybkjær, L., Bernsen, N.O., Blasig, R., Buisine, S., Fredriksson, M., Gustafson, J., Martin, J.C., Wirn, M., 2003. Evaluation criteria and evaluation plan. Technical Report, NICE Deliverable D7.1, University of Southern Denmark Dybkjær, L., Bernsen, N.O., Carlson, R., Chase, L., Dahlbäck, N., Failenschmid, K., Heid, U., Heisterkamp, P., Jönsson, A., Kamp, H., Karlsson, I., Kuppevelt, J., Lamel, L., Paroubek, P., Williams, D. 1998a. The DISC approach to spoken language systems development and evaluation. In: Proceedings of the First International Conference on Language Resources and Evaluation (LREC 1998), Granada. pp. 185–189 Dybkjær, 1998, A methodology for diagnostic evaluation of spoken human–machine dialogue, International Journal of Human Computer Studies, Special Issue on Miscommunication, 48, 605, 10.1006/ijhc.1997.0183 Ekman, 1975 Ferguson, G., Allen, J., 1998. TRIPS: An integrated intelligent problem-solving assistant. In: Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI 1998), Madison, USA. pp. 567–572 Fraser, N., 1997. Spoken dialogue system evaluation: A first framework for reporting results. In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech 1997), Rhodes. pp. 1907–1910 Gärtner, U., König, W., Wittig, T., 2001. Evaluation of manual vs. speech input when using a driver information system in real traffic. In: Proceedings of the International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, Aspen, Colorado Gibbon, D., Moore, R., Winski, R., 1997. Handbook of Standards and Resources for Spoken Language Systems. Walter de Gruyter Gilbert, N., Cheepen, C., Failenschmid, K., Williams, D., 1999. Guidelines for advanced spoken dialogue design. Available from <http://www.soc.surrey.ac.uk/research/guidelines> Glass, J., Polifroni, J., Seneff, S., Zue, V., 2000. Data collection and performance evaluation of spoken dialogue systems: The MIT experience. In: Proceedings of the International Conference on Spoken Language Processing, (ICSLP 2000), Beijing, Vol. 4. pp. 1–4 Grice, 1975, Logic and conversation, 41 Gustafson, J., Bell, L., Beskow, J., Boye, J., Carlson, R., Edlund, J., Granström, B., House, D., Wiren, M., 2000. AdApt––a multimodal conversational dialogue system in an apartment domain. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP 2000), Vol. 2. Beijing, pp. 134–137 Gustafson, J., Lindberg, N., Lundeberg, M., 1999. The August spoken dialogue system. In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech 1999), Budapest. pp. 1151–1154 Handrieder, G., Heisterkamp, P., Brey, T., 1998. Fly with the EAGLES: Evaluation of the ACCeSS spoken dialogue system. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP 1998), Sydney. pp. 503–506 Hirschberg, J., Swerts, M., Litman, D., 2001. Labeling corrections and aware sites in spoken dialogue systems. In: Proceedings of the 2nd SIGdial Workshop on Discourse and Dialogue, Aalborg, Denmark. pp. 72–79 Hjalmarson, A., 2002. Evaluating AdApt, a multi-modal conversational dialogue system using PARADISE. Ph.D. Thesis, KTH, Stockholm Karlsson, I., 1999. A survey of existing methods and tools for development and evaluation of speech synthesis and speech synthesis quality in SLDSs. Technical report, DISC Deliverable D2.3 King, M., Maegard, B., Schutz, J., des Tombes, L., 1996. EAGLES––Evaluation of natural language processing systems. Technical Report EAG-EWG-PR.2 Komatani, K., Ueno, S., Kawahara, T., Okuno, H., 2003. User modeling in spoken dialogue systems for flexible guidance generation. In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech 2003), Geneva. pp. 745–748 Larsen, L.B., 2003. Assessment of spoken dialogue system usability - What are we really measuring? In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech 2003), Geneva. pp. 1945–1948 Leavitt, 2003, Two technologies vie for recognition in speech market, IEEE Computer, 36, 13, 10.1109/MC.2003.1204316 Mariani, J., 2002. Technolangue: language technology. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC2002), Las Palmas. Available from <http://www.lrec-conf.org/lrec2002/lrec/isplre/ISP_Description_2002.pdf> Mariani, J., Paroubek, P., 1999. Human language technologies evaluation in the European framework. In: Proceedings of the DARPA Broadcast News Workshop. pp. 237–242 Minker, W., Haiber, U., Heisterkamp, P., Scheible, S., 2002. Intelligent dialogue strategy for accessing infotainment applications in mobile environments. In: Proceedings of the ISCA Workshop on Multi-Modal Spoken Dialogue in Mobile Environments. European Speech Communication Association, Kloster Irsee, Germany. Bonn Oviatt, S., 1997. Multimodal interactive maps: Designing for human performance. Human–Computer Interaction, Special Issue on Multimodal Interfaces. pp. 93–129 Oviatt, S., 2001. Advances in the robust processing of multimodal speech and pen systems. In: Yuen, P.C., Y. T., Wang, P. (Eds.), Multimodal Interfaces for Human Machine Communication, World Scientific Publisher: London, Series on Machine Perception and Artificial Intelligence. pp. 203–218 Pallett, D., Fiscus, J., Fisher, W., Garofolo, J., Lund, B., Martin, A., Przybocki, M., 1994. 1994 benchmark tests for the ARPA spoken language program. In: Proceedings of the ARPA Workshop on Spoken Language Technology. pp. 5–36 Peckham, J., 1993. A new generation of spoken dialogue systems: Results and lessons from the SUNDIAL project. In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech 1993), Berlin. pp. 33–40 Polifroni, J., Seneff, S., 2000. Galaxy-II as an architecture for spoken dialogue evaluation. In: Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000), Athens. pp. 725–730 Ramshaw, L., Boisen, S., 1990. An SLS answer comparator. Technical report, BBN Systems and Technologies Corporation. SLS Note 7 Roth, 1997, Towards an information visualization workspace: Combining multiple means of expression, Human–Computer Interaction, 12, 131, 10.1207/s15327051hci1201&2_5 Sanders, G., Le, A., Garofolo, J., 2002. Effects of word error rate in the DARPA Communicator data during 2000 and 2001. In: Proceedings of the International Conference of Spoken Language Processing, (ICSLP 2002), Denver. pp. 277–280 Seneff, S., Hurley, E., Lau, R., Paoa, C., Schmid, P., Zue, V., 1998. Galaxy-II: A reference architecture for conversational system development. In: Proceedings of the International Conference of Spoken Language Processing, (ICSLP 1998), Sydney. pp. 931–934 Simpson, A., Fraser, N., 1993. Blackbox and glassbox evaluation of the SUNDIAL system. In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech 1993), Berlin. pp. 1423–1426 Sturm, J., Cranen, B., Wang, F., Terken, J., Bakx, I., 2002. The effect of prolonged use on multimodal interaction. In: Proceedings of the ISCA Workshop on Multi-Modal Spoken Dialogue in Mobile Environments. European Speech Communication Association, Kloster Irsee, Germany. Bonn Sturm, J., den Os, E., Boves, L., 1999. Issues in spoken dialogue systems: Experiences with the Dutch ARISE system. In: Proceedings of ESCA Workshop on Interactive Dialogue in Multi-Modal Systems, Kloster Irsee, Germany. pp. 1–4 Temem, J., Lamel, L., Gauvain, J., 1999. The MASK demonstrator: An emerging technology for user-friendly passengers kiosk. In: World Congress on Railway Research Wahlster, W., 1993. Verbmobil––Translation of face to face dialogues. In: Machine Translation Summary IV, Kobe, Japan Wahlster, W., Reithinger, N., Blocher, A., 2001. SmartKom: Multimodal communication with a life-like character. In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech 2001), Aalborg, Denmark. pp. 1547–1550 Walker, M., Hirschmann, L., Aberdeen, J., 2000a. Evaluation for DARPA COMMUNICATOR spoken dialogue systems. In: Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece. pp. 735–741 Walker, M., Kamm, C., Litman, D., 2000b. Towards developing general models of usability with PARADISE. Natural Language Engineering, Special Issue on Spoken Dialogue Systems 6 (3) Walker, M., Litman, D., Kamm, C., Abella, A., 1997. PARADISE: A general framework for evaluating spoken dialogue agents. In: Proceedings of the 35th Annual Meeting of the Association of Computational Linguistics (ACL/EACL 1997). pp. 271–280 Walker, M., Rudnicky, A., Prasad, R., Aberdeen, J., Bratt, E., Garofolo, J., Hastie, H., Le, A., Pellom, B., Potamianos, A., Passonneau, R., Roukos, S., Sanders, G., Seneff, S., Stallard, D., 2002. DARPA Communicator: Cross-system results for the 2001 evaluation. In: Proceedings of the International Conference of Spoken Language Processing (ICSLP 2002), Denver. pp. 269–272 Young, 1997, Multilingual large vocabulary speech recognition: The European SQALE project, Computer Speech and Language, 11, 73, 10.1006/csla.1996.0023