“Let me explain!”: exploring the potential of virtual agents in explainable AI interaction design

Journal on Multimodal User Interfaces - Tập 15 Số 2 - Trang 87-98 - 2021
Katharina Weitz1, Dominik Schiller1, Ruben Schlagowski1, Tobias B. Huber1, Elisabeth André1
1Department of Computer Science, Human-Centered Multimedia, Augsburg University, Universitätsstraße 6a, Augsburg, Germany

Tóm tắt

AbstractWhile the research area of artificial intelligence benefited from increasingly sophisticated machine learning techniques in recent years, the resulting systems suffer from a loss of transparency and comprehensibility, especially for end-users. In this paper, we explore the effects of incorporating virtual agents into explainable artificial intelligence (XAI) designs on the perceived trust of end-users. For this purpose, we conducted a user study based on a simple speech recognition system for keyword classification. As a result of this experiment, we found that the integration of virtual agents leads to increased user trust in the XAI system. Furthermore, we found that the user’s trust significantly depends on the modalities that are used within the user-agent interface design. The results of our study show a linear trend where the visual presence of an agent combined with a voice output resulted in greater trust than the output of text or the voice output alone. Additionally, we analysed the participants’ feedback regarding the presented XAI visualisations. We found that increased human-likeness of and interaction with the virtual agent are the two most common mention points on how to improve the proposed XAI interaction design. Based on these results, we discuss current limitations and interesting topics for further research in the field of XAI. Moreover, we present design recommendations for virtual agents in XAI systems for future projects.

Từ khóa


Tài liệu tham khảo

Alqaraawi A, Schuessler M, Weiß P, Costanza E, Berthouze N (2020) Evaluating saliency map explanations for convolutional neural networks: a user study. arXiv:2002.00772

Bach S, Binder A, Montavon G, Klauschen F, Müller KR, Samek W (2015) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS One 10(7):e0130140. https://doi.org/10.1371/journal.pone.0130140

Broekens J, Harbers M, Hindriks K, Van Den Bosch K, Jonker C, Meyer JJ (2010) Do you get it? user-evaluated explainable BDI agents. In: German conference on multiagent system technologies. Springer, pp 28–39

Chen JYC, Procci K, Boyce M, Wright J, Garcia A, Barnes MJ (2014) Situation awareness-based agent transparency. US Army Research Laboratory

Chiu CC, Marsella S (2011) How to train your avatar: a data driven approach to gesture generation. In: International workshop on intelligent virtual agents. Springer, pp 127–140, https://doi.org/10.1007/978-3-642-23974-8_14

De Graaf MMA, Malle BF (2017) How people explain action (and autonomous intelligent systems should too). In: AAAI 2017 fall symposium on AI-HRI, pp 19–26

Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vis 59(2):167–181. https://doi.org/10.1023/B:VISI.0000022288.19776.77

Garrod S, Pickering MJ (2004) Why is conversation so easy? Trends Cogn Sci 8(1):8–11. https://doi.org/10.1016/j.tics.2003.10.016

Gatt A, Paggio P (2014) Learning when to point: a data-driven approach. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical Papers, pp 2007–2017

Gilpin LH, Bau D, Yuan BZ, Bajwa A, Specter M, Kagal L (2018) Explaining explanations: an approach to evaluating interpretability of machine learning. arXiv:1806.00069

Gunning D (2017) Explainable artificial intelligence (XAI). Defense Advanced Research Projects Agency (DARPA)

Hoff KA, Bashir M (2015) Trust in automation: integrating empirical evidence on factors that influence trust. Hum Factors 57(3):407–434. https://doi.org/10.1177/0018720814547570

Hoffman JD, Patterson MJ, Lee JD, Crittendon ZB, Stoner HA, Seppelt BD, Linegang MP (2006) Human-automation collaboration in dynamic mission planning: a challenge requiring an ecological approach. In: Proceedings of the human factors and ergonomics society annual meeting, vol 50(23), pp 2482–2486. https://doi.org/10.1177/154193120605002304

Jian JY, Bisantz AM, Drury CG (2000) Foundations for an empirically determined scale of trust in automated systems. Int J Cognit Ergon. https://doi.org/10.1207/S15327566IJCE0401_04

Kisler T, Reichel U, Schiel F (2017) Multilingual processing of speech via web services. Comput Speech Lang 45:326–347. https://doi.org/10.1016/j.csl.2017.01.005

Lane HC, Core MG, Van Lent M, Solomon S, Gomboc D (2005) Explainable artificial intelligence for training and tutoring. University of Southern California/Institute for Creative Technologies, Tech. rep

Lee JD, See KA (2004) Trust in automation: designing for appropriate reliance. Hum Factors 46(1):50–80

Lipton ZC (2018) The mythos of model interpretability. Commun ACM 61(10):36–43. https://doi.org/10.1145/3233231

Mercado JE, Rupp MA, Chen JY, Barnes MJ, Barber D, Procci K (2016) Intelligent agent transparency in human-agent teaming for multi-UxV management. Hum Factors 58(3):401–415. https://doi.org/10.1177/0018720815621206

Miller T (2018) Explanation in artificial intelligence: insights from the social sciences. Artif Intell 267:1–38. https://doi.org/10.1016/j.artint.2018.07.007

Miller T, Howe P, Sonenberg L (2017) Explainable AI: beware of inmates running the asylum. In: IJCAI International joint conference on artificial intelligence, arXiv:1712.00547

Montavon G, Samek W, Müller KR (2017) Methods for interpreting and understanding deep neural networks. Digit Signal Process 73:1–15. https://doi.org/10.1016/j.dsp.2017.10.011

Park DH, Hendricks LA, Akata Z, Rohrbach A, Schiele B, Darrell T, Rohrbach M (2018) Multimodal explanations: Justifying decisions and pointing to the evidence. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp 8779–8788, https://doi.org/10.1109/CVPR.2018.00915

Perugini M, Gallucci M, Costantini G (2018) A practical primer to power analysis for simple experimental designs. Int Rev Soc Psychol 31(1):20. https://doi.org/10.5334/irsp.181

Ravenet B, Clavel C, Pelachaud C (2018) Automatic nonverbal behavior generation from image schemas. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, pp 1667–1674

Ribeiro MT, Singh S, Guestrin C (2016) Why should i trust you? Explaining the predictions of any classifier. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1135–1144, https://doi.org/10.1145/2939672.2939778

Richardson A, Rosenfeld A (2018) A survey of interpretability and explainability in human-agent systems. In: Proceedings of the 2nd workshop of explainable artificial intelligence, pp 137–143

Sainath TN, Parada C (2015) Convolutional neural networks for small-footprint keyword spotting. Proc Interspeech 2015:1478–1482

Samek W, Wiegand T, Müller KR (2017) Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296 pp 1–8

Schmid U (2018) Inductive programming as approach to comprehensible machine learning. In: Proceedings of the 7th workshop on dynamics of knowledge and belief (DKB-2018) and the 6th workshop KI & Kognition (KIK-2018), co-located with 41st German conference on artificial intelligence, vol 2194

Selvaraju RR, Das A, Vedantam R, Cogswell M, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: The IEEE international conference on computer vision (ICCV) 2017, pp 618–626

Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2020) Grad-cam: visual explanations from deep networks via gradient-based localization. Int J Comput Vis 128(2):336–359

Siebers M, Schmid U (2018) Please delete that! why should I? Explaining learned irrelevance classifications of digital objects. KI - Künstliche Intelligenz. https://doi.org/10.1007/s13218-018-0565-5

Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: visualising image classification models and saliency maps. http://arxiv.org/abs/1312.6034, arXiv:1312.6034

Stubbs K, Hinds PJ, Wettergreen D (2007) Autonomy and common ground in human-robot interaction: a field study. IEEE Intell Syst 22(2):42–50. https://doi.org/10.1109/MIS.2007.21

Susan Robinson MI David Traum, Henderer J (2008) What would you ask a conversational agent? observations of human-agent dialogues in a museum setting. In: Proceedings of the 6th international conference on language resources and evaluation (LREC’08), European Language Resources Association

Thorndike EL (1920) A constant error in psychological ratings. J Appl Psychol 4(1):25–29

Van Mulken S, André E, Müller J (1998) The persona effect: how substantial is it? In: People and computers XIII. Springer, pp 53–66, https://doi.org/10.1007/978-1-4471-36057_4

Van Mulken S, André E, Müller J (1999) An empirical study on the trustworthiness of life-like interface agents. In: Human–Computer interaction: communication, cooperation, and application design, proceedings of 8th international conference on human–computer interaction, 1999, pp 152–156

Vinyals O, Le QV (2015) A neural conversational model. arXiv preprint arXiv:1506.05869

Wagner J, Schiller D, Seiderer A, André E (2018) Deep learning in paralinguistic recognition tasks: are hand-crafted features still relevant? Proc Interspeech 2018:147–151

Wang J, Chen Y, Hao S, Peng X, Hu L (2018) Deep learning for sensor-based activity recognition: a survey. Pattern Recognit Lett 119:3–11. https://doi.org/10.1016/j.patrec.2018.02.010

Warden P (2018) Speech commands: a dataset for limited-vocabulary speech recognition. arXiv:1804.03209v1

Weitz K, Hassan T, Schmid U, Garbas JU (2019a) Deep-learned faces of pain and emotions: Elucidating the differences of facial expressions with the help of explainable ai methods. tm-Technisches Messen 86(7-8):404–412, https://doi.org/10.1515/teme-2019-0024

Weitz K, Schiller D, Schlagowski R, Huber T, André E (2019b) “Do you trust me?”: Increasing user-trust by integrating virtual agents in explainable ai interaction design. In: Proceedings of the 19th ACM international conference on intelligent virtual agents. ACM, New York, NY, USA, IVA ’19, pp 7–9, https://doi.org/10.1145/3308532.3329441

Wu J, Ghosh S, Chollet M, Ly S, Mozgai S, Scherer S (2018) Nadia: Neural network driven virtual human conversation agents. In: Proceedings of the 18th international conference on intelligent virtual agents. ACM, pp 173–178, https://doi.org/10.1145/3267851.3267860

Zhang Z, Geiger J, Pohjalainen J, Mousa AED, Jin W, Schuller B (2018) Deep learning for environmentally robust speech recognition: an overview of recent developments. ACM Trans Intell Syst Technol (TIST) 9(5):49:1–49:28, https://doi.org/10.1145/3178115