Usefulness, localizability, humanness, and language-benefit: additional evaluation criteria for natural language dialogue systems

International Journal of Speech Technology - Tập 19 - Trang 373-383 - 2016

Bayan AbuShawar¹, Eric Atwell²

¹IT Department, Arab Open University, Amman, Jordan

²School of Computing, University of Leeds, Leeds, UK

Tóm tắt

Human–computer dialogue systems interact with human users using natural language. We used the ALICE/AIML chatbot architecture as a platform to develop a range of chatbots covering different languages, genres, text-types, and user-groups, to illustrate qualitative aspects of natural language dialogue system evaluation. We present some of the different evaluation techniques used in natural language dialogue systems, including black box and glass box, comparative, quantitative, and qualitative evaluation. Four aspects of NLP dialogue system evaluation are often overlooked: “usefulness” in terms of a user’s qualitative needs, “localizability” to new genres and languages, “humanness” or “naturalness” compared to human–human dialogues, and “language benefit” compared to alternative interfaces. We illustrated these aspects with respect to our work on machine-learnt chatbot dialogue systems; we believe these aspects are worthwhile in impressing potential new users and customers.

Tài liệu tham khảo

Abu Shawar, B. (2008). Chatbots are natural web interface to information portals. In Proceedings of INFOS2008, (pp. NLP101–NLP107).

Abu Shawar, B. (2011). A chatbot as a natural web Interface to Arabic web QA. International Journal of Emerging Technologies in Education (iJET), 6(1), 37–43.

Abu Shawar, B., & Atwell, E. (2010). Chatbots: Can they serve as natural language interfaces to QA corpus? In Proceedings of the sixth IASTED international conference advances in computer science and engineering (ACSE 2010), (pp. 183–188).

Aust, H., Oerder, M., Seide, F., & Steinbiss, V. (1995). The Philips Automatic Train Timetable Information System. Speech Communication, 17, 249–262.

Chai, J., Horvath, V., Nicolov, N., Stys-Budzikowska, M., Kambhatla, N., & Zadrozny, W. (2001b). Natural language sales assistant—A web-based dialog system for online sales. In Proceedings of the thirteenth innovative applications of artificial intelligence conference, (pp. 19–26). The AAAI Press.

Chai, J., Lin, J., Zadrozny, W., Ye, Y., Stys-Budzikowska, M., Horvath, V., et al. (2001a). The role of a natural language conversational interface in online sales: a case study. International Journal of Speech Technology, 4(3/4), 285–295.

Colby, K. (1973). Simulation of belief systems. In R. Schank & K. Colby (Eds.), Computer Models of Thought and Language (pp. 251–286). San Francisco: Freeman.

Colby, K. (1975). Artificial Paranoia: A Computer Simulation of Paranoid Processes. New York, NY: Pergamon Press.

Crockett, K., Bandar, Z., O’Shea, J. & Mclean, D. (2009). Bullying and debt: Developing novel applications of dialogue systems. In A. Jönsson, J. Alexandersson, D. Traum and I. Zukerman (Eds.), Proceedings of the 6th IJCAI Workshop on knowledge and reasoning in practical dialogue systems, Palo Alto, CA: AAAI (www.aaai.org).

Cunningham, H. (1999). A definition and short history of language engineering. Journal of Natural Language Engineering, 5(1), 1–16.

Dahlbaeck, N., Jonsson, A., & Ahrenberg, A. (1993). In Wizard of Oz studies: Why and how? Proceedings of intelligent user interfaces (IUI 93) (pp. 193–200). New York, NY: ACM Press.

Dybkjaer, L., Bernsen, N. O., & Minker, W. (2004). Evaluation and usability of multimodal spoken language dialogue systems. Speech Communication, 43(1–2), 33–54.

Elliott D., Atwell E., & Hartley A. (2004). Compiling and using a shareable parallel corpus for MT evaluation. In Proceedings of the workshop on the amazing utility of parallel and comparable corpora, fourth international conference on language resources and evaluation (LREC), Lisbon, Portugal, (pp. 18–21).

Gandhe, S. & Traum, D. (2007). First steps towards dialogue modeling from an un-annotated human-human corpus. In Proceedings of the 5th workshop on knowledge and reasoning in practical dialogue systems, Hyderabad, India, (pp. 22–27).

Glass, J., Polifroni, J., Seneff, S., & Zue, V. (2000). Data collection and performance evaluation of spoken dialogue systems: The MIT experience. In Proceedings international conference on spoken language processing, Beijing, China, October 2000

Güzeldere, G., & Franchi, S. (1995). Dialogue with colourful “personalities” of early AI. Stanford Electronic Humanities Review, 4(2), 1–9.

Hasida, K., & Den, Y. (1999). A synthetic evaluation of dialogue systems. In Y. Wilks (Ed.), Machine Conversations (pp. 113–126). Boston: Kluwer.

Hirschman, L. (1995). The roles of language processing in a spoken language interface. In D. Roe & J. Wilpon (Eds.), Voice Communication between Humans and Machines (pp. 217–237). Washington, DC: National Academy Press.

Hirschman, L., & Thompson, H. (1997). Overview of evaluation in speech and natural language processing. In R. A. Cole, J. Mariani, H. Uzkoreit, A. Zaanen, & V. Zue (Eds.), State of the Art in Natural Language Processing (pp. 475–518). Cambridge: Cambridge University Press.

Hughes, J., & Atwell, E. (1994). The automated evaluation of inferred word classifications. In A. G. Cohn (Ed.), Proceedings of ECAI’94: 11th European Conference on Artificial Intelligence (pp. 535–540). Chichester: John Wiley.

Inui, N., Koiso, T., & Kotani Y. (2002). Using patterns for syntactic parsing, In Proceedings of IASTED international conference artificial intelligence and applications, (pp. 522–527).

Inui, N., Koiso, T., Nakamura, J., & Kotani, Y. (2003). Fully corpus-based natural language dialogue system, AAAI Spring Symposium on Natural Language Generation in Spoken and Written Dialogue, AAAI Technical Report S-03-06 (pp. 58–64). Palo Alto, CA: AAAI.

Kelly, D., Kantor, P., Morse, E., Scholtz, J. & Sun, Y. (2006). User-centered evaluation of interactive question-answering Systems, In Proceedings of the interactive question answering workshop at HLT-NAACL 2006, June, Stroudsburg, PA: Association for Computational Linguistics, (pp. 49–56).

Koiso, T., Ikeda, T., Inui, N., and Kotani, Y. (2002). A dialog system which chooses a response using similarity between a surface case rule patterns. In Proceedings of the IPSJ Conference, IM-03, 2002.

Kruschwitz, U., De Roeck, A., Scott, P., Steel, S., Turner, R., & Webb N. (2000). Extracting semistructured data-lessons learnt. In Proceedings of the 2nd international conference on natural language processing (NLP2000), (pp. 406–417).

Kruschwitz, U., De Roeck, A., Scott, P., Steel, S., Turner, R., & Webb, N. (1999). Natural language access to yellow pages. In Proceedings of third International conference on knowledge-based intelligent information engineering systems, (pp. 34–37).

Maier, E., Mast, M., & Luperfoy, S. (1996). Overview. In E. Maier, M. Mast, & S. Luperfoy (Eds.), Dialogue Processing in Spoken Language Systems (pp. 1–13). Berlin: Springer.

McTear, M. (2002). Spoken dialogue technology: Enabling the conversational user interface. ACM Computing Surveys, 34(1), 90–169.

Mikic, F. A., Burguillo, J. C., Rodríguez, D. A., Rodríguez, E., &d Llamas, M. (2008). T-BOT and Q-BOT: A couple of AIML-based bots for tutoring courses and evaluating students. In Proceedings of 38th ASEE/IEEE frontiers in education conference, (pp. S3A-7-S3A-12).

Quarteroni, S. (2008). Personalized, interactive question answering on the Web. Proceedings of the Workshop on Knowledge and Reasoning for Answering Questions (KRAQ ‘08), COLING 2008, Stroudsburg, PA: ACL, (pp. 33–40).

Quarteroni, S., & Manandhar, S. (2006). User modeling for adaptive question answering and Information retrieval. G.C.J. Sutcliiffe & R. G. Goebel (Eds.), In Proceedings of the nineteenth international florida artificial intelligence research society conference (FLAIRS-19), Melbourne Beach, FL, May 2006, (pp. np).

Quarteroni, S. and Manandhar, S. (2007). A chatbot-based interactive question answering system. In Proceedings of the 11th workshop on the semantics and pragmatics of dialogue (SemDial 11), Rovereto, (DECALOG 2007), (pp. 83–90).

Rayson, P. (2003). Matrix: a statistical method and software tool for linguistic analysis through corpus comparison. Ph.D. thesis, Bailrigg, Lancaster: Lancaster University.

Schuetzler, R., Grimes, G. M., Giboney, J., & Buckman, J. (2014). Facilitating natural conversational agent interactions: Lessons from a deception experiment. International conference on information systems. Auckland, December 14–17, 2014, pp. 1–16.

Shaalan, K. (2014). A Survey of Arabic Named Entity Recognition and Classification. Computational Linguistics, 40(2), 469–510.

Traum, D. R., Swartout, W., Marsella, S. & Gratch, J. (2005). Virtual humans for non-team interaction training. In proceedings of the AAMAS Workshop on creating bonds with embodied conversational Agents.

Van Zaanen, M., Roberts, A., & Atwell, E.S. (2004). A multilingual parallel parsed corpus as gold standard for grammatical inference evaluation. In Proceedings of the workshop on the amazing utility of parallel and comparable corpora. Fourth International Conference on Language Resources and Evaluation (LREC), Lisbon, (pp. 58–61).

Walker, M., Litman, A., Kamm, D., and Abella, A. (1997). Evaluating interactive Dialogue systems: Extending component evaluation to integrated system evaluation. In Proceedings of the ACL/EACL workshop on spoken dialogue systems (pp. 1–8).

Wallace, R. (2003). The Elements of AIML Style. Foundation: ALICE A.I.

Weizenbaum, J. (1966). ELIZA-A computer program for the study of natural language communication between man and machine. Communications of the ACM, 10(8), 36–45.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA