Usefulness, localizability, humanness, and language-benefit: additional evaluation criteria for natural language dialogue systems

International Journal of Speech Technology - Tập 19 - Trang 373-383 - 2016
Bayan AbuShawar1, Eric Atwell2
1IT Department, Arab Open University, Amman, Jordan
2School of Computing, University of Leeds, Leeds, UK

Tóm tắt

Human–computer dialogue systems interact with human users using natural language. We used the ALICE/AIML chatbot architecture as a platform to develop a range of chatbots covering different languages, genres, text-types, and user-groups, to illustrate qualitative aspects of natural language dialogue system evaluation. We present some of the different evaluation techniques used in natural language dialogue systems, including black box and glass box, comparative, quantitative, and qualitative evaluation. Four aspects of NLP dialogue system evaluation are often overlooked: “usefulness” in terms of a user’s qualitative needs, “localizability” to new genres and languages, “humanness” or “naturalness” compared to human–human dialogues, and “language benefit” compared to alternative interfaces. We illustrated these aspects with respect to our work on machine-learnt chatbot dialogue systems; we believe these aspects are worthwhile in impressing potential new users and customers.

Tài liệu tham khảo

Abu Shawar, B. (2008). Chatbots are natural web interface to information portals. In Proceedings of INFOS2008, (pp. NLP101–NLP107).

Aust, H., Oerder, M., Seide, F., & Steinbiss, V. (1995). The Philips Automatic Train Timetable Information System. Speech Communication, 17, 249–262.

Chai, J., Lin, J., Zadrozny, W., Ye, Y., Stys-Budzikowska, M., Horvath, V., et al. (2001a). The role of a natural language conversational interface in online sales: a case study. International Journal of Speech Technology, 4(3/4), 285–295.

Colby, K. (1973). Simulation of belief systems. In R. Schank & K. Colby (Eds.), Computer Models of Thought and Language (pp. 251–286). San Francisco: Freeman.

Crockett, K., Bandar, Z., O’Shea, J. & Mclean, D. (2009). Bullying and debt: Developing novel applications of dialogue systems. In A. Jönsson, J. Alexandersson, D. Traum and I. Zukerman (Eds.), Proceedings of the 6th IJCAI Workshop on knowledge and reasoning in practical dialogue systems, Palo Alto, CA: AAAI (www.aaai.org).

Dahlbaeck, N., Jonsson, A., & Ahrenberg, A. (1993). In Wizard of Oz studies: Why and how? Proceedings of intelligent user interfaces (IUI 93) (pp. 193–200). New York, NY: ACM Press.

Güzeldere, G., & Franchi, S. (1995). Dialogue with colourful “personalities” of early AI. Stanford Electronic Humanities Review, 4(2), 1–9.

Inui, N., Koiso, T., & Kotani Y. (2002). Using patterns for syntactic parsing, In Proceedings of IASTED international conference artificial intelligence and applications, (pp. 522–527).

Koiso, T., Ikeda, T., Inui, N., and Kotani, Y. (2002). A dialog system which chooses a response using similarity between a surface case rule patterns. In Proceedings of the IPSJ Conference, IM-03, 2002.

Maier, E., Mast, M., & Luperfoy, S. (1996). Overview. In E. Maier, M. Mast, & S. Luperfoy (Eds.), Dialogue Processing in Spoken Language Systems (pp. 1–13). Berlin: Springer.

McTear, M. (2002). Spoken dialogue technology: Enabling the conversational user interface. ACM Computing Surveys, 34(1), 90–169.

Rayson, P. (2003). Matrix: a statistical method and software tool for linguistic analysis through corpus comparison. Ph.D. thesis, Bailrigg, Lancaster: Lancaster University.

Walker, M., Litman, A., Kamm, D., and Abella, A. (1997). Evaluating interactive Dialogue systems: Extending component evaluation to integrated system evaluation. In Proceedings of the ACL/EACL workshop on spoken dialogue systems (pp. 1–8).

Wallace, R. (2003). The Elements of AIML Style. Foundation: ALICE A.I.