Benchmarking and survey of explanation methods for black box models

Data Mining and Knowledge Discovery - Tập 37 - Trang 1719-1778 - 2023
Francesco Bodria1, Fosca Giannotti1, Riccardo Guidotti2, Francesca Naretto1, Dino Pedreschi2, Salvatore Rinzivillo3
1Scuola Normale Superiore, Pisa, Italy
2University of Pisa, Pisa, Italy
3ISTI-CNR, Pisa, Italy

Tóm tắt

The rise of sophisticated black-box machine learning models in Artificial Intelligence systems has prompted the need for explanation methods that reveal how these models work in an understandable way to users and decision makers. Unsurprisingly, the state-of-the-art exhibits currently a plethora of explainers providing many different types of explanations. With the aim of providing a compass for researchers and practitioners, this paper proposes a categorization of explanation methods from the perspective of the type of explanation they return, also considering the different input data formats. The paper accounts for the most representative explainers to date, also discussing similarities and discrepancies of returned explanations through their visual appearance. A companion website to the paper is provided as a continuous update to new explainers as they appear. Moreover, a subset of the most robust and widely adopted explainers, are benchmarked with respect to a repertoire of quantitative metrics.

Tài liệu tham khảo

Abujabal A, Roy RS, Yahya M, et al (2017) QUINT: interpretable question answering over knowledge bases. In: Proceedings of the 2017 conference on empirical methods in natural language processing, EMNLP 2017, Copenhagen, Denmark—system demonstrations Adadi A, Berrada M (2018) Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access Adebayo J, Gilmer J, Muelly M, et al (2018) Sanity checks for saliency maps. In: Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, Montréal, Canada Adebayo J, Muelly M, Liccardi I, et al (2020) Debugging tests for model explanations. In: Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, virtual Agarwal R, Melnick L, Frosst N, et al (2021) Neural additive models: Interpretable machine learning with neural nets. In: Advances in neural information processing systems 34: annual conference on neural information processing systems 2021, NeurIPS 2021, virtual Aggarwal CC, Zhai C (2012) A survey of text classification algorithms. In: Mining text data. Springer, pp 163–222 Albini E, Rago A, Baroni P, et al (2020) Relation-based counterfactual explanations for bayesian network classifiers. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI 2020 Alvarez-Melis D, Jaakkola TS (2018) Towards robust interpretability with self-explaining neural networks. In: Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, Montréal, Canada Anjomshoae S, Najjar A, Calvaresi D, et al (2019) Explainable agents and robots: Results from a systematic literature review. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, AAMAS ’19, Montreal, QC, Canada Anjomshoae S, Kampik T, Främling K (2020) Py-ciu: a python library for explaining machine learning predictions using contextual importance and utility. In: IJCAI-PRICAI 2020 workshop on explainable artificial intelligence (XAI) Apley DW, Zhu J (2016) Visualizing the effects of predictor variables in black box supervised learning models. arXiv preprint arXiv:1612.08468 Arras L, Montavon G, Müller K, et al (2017) Explaining recurrent neural network predictions in sentiment analysis. In: Proceedings of the 8th workshop on computational approaches to subjectivity, sentiment and social media analysis, WASSA@EMNLP 2017, Copenhagen, Denmark Arrieta AB, Rodríguez ND, Ser JD, et al (2020) Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fus Artelt A, Hammer B (2019) On the computation of counterfactual explanations—a survey. arXiv preprint arXiv:1911.07749 Arya V, Bellamy RKE, Chen P, et al (2019) One explanation does not fit all: A toolkit and taxonomy of AI explainability techniques. arXiv preprint arXiv:1909.03012 Bach S, Binder A, Montavon G et al (2015) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS One 10(7):e0130140 Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, conference track proceedings Bien J, Tibshirani R (2011) Prototype selection for interpretable classification. Ann Appl Stat 2403–2424 Blanco-Justicia A, Domingo-Ferrer J, Martínez S, et al (2020) Machine learning explainability via microaggregation and shallow decision trees. Knowl Based Syst Boz O (2002) Extracting decision trees from trained neural networks. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada Bramhall S, Horn H, Tieu M et al (2020) Qlime-a quadratic local interpretable model-agnostic explanation approach. SMU Data Sci Rev 3(1):4 Byrne RM (2019) Counterfactuals in explainable artificial intelligence (XAI): evidence from human reasoning. In: IJCAI, pp 6276–6282 Byrne RM, Johnson-Laird P (2020) If and or: real and counterfactual possibilities in their truth and probability. J Exp Psychol Learn Mem Cogn 46(4):760 Cai L, Ji S (2020) A multi-scale approach for graph link prediction. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, the tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA Calamoneri T (2006) The L(h, k)-labelling problem: a survey and annotated bibliography. Comput J Carvalho DV, Pereira EM, Cardoso JS (2019) Machine learning interpretability: a survey on methods and metrics. Electronics 8(8):832 Chattopadhay A, Sarkar A, Howlader P, et al (2018) Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE winter conference on applications of computer vision (WACV), IEEE Chemmengath SA, Azad AP, Luss R, et al (2022) Let the CAT out of the bag: Contrastive attributed explanations for text. In: Proceedings of the 2022 conference on empirical methods in natural language processing, EMNLP 2022, Abu Dhabi, United Arab Emirates Chen J, Song L, Wainwright MJ, et al (2018) Learning to explain: an information-theoretic perspective on model interpretation. In: Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden Chen C, Li O, Tao D, et al (2019) This looks like that: deep learning for interpretable image recognition. In: Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, Vancouver, BC, Canada Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. In: Proceedings of the 2016 conference on empirical methods in natural language processing, EMNLP 2016, Austin, Texas, USA Chipman H, George E, McCulloh R (1998) Making sense of a forest of trees. Comput Sci Stat Chouldechova A (2017) Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data Chowdhary K (2020) Natural language processing. In: Fundamentals of artificial intelligence. Springer, pp 603–649 Chowdhury T, Rahimi R, Allan J (2022) Equi-explanation maps: concise and informative global summary explanations. In: 2022 ACM conference on fairness, accountability, and transparency, FAccT ’22 Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory Craven MW, Shavlik JW (1995) Extracting tree-structured representations of trained networks. In: Advances in neural information processing systems 8, NIPS, Denver, CO, USA Danilevsky M, Qian K, Aharonov R, et al (2020) A survey of the state of explainable AI for natural language processing. In: Proceedings of the 1st conference of the Asia-Pacific chapter of the association for computational linguistics and the 10th international joint conference on natural language processing, AACL/IJCNLP 2020, Suzhou, China Das A, Gupta C, Kovatchev V, et al (2022) Prototex: explaining model decisions with prototype tensors. In: Proceedings of the 60th annual meeting of the association for computational linguistics (vol. 1: long papers), ACL 2022, Dublin, Ireland Dash S, Günlük O, Wei D (2018) Boolean decision rules via column generation. In: Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, Montréal, Canada Desai S, Ramaswamy HG (2020) Ablation-cam: visual explanations for deep convolutional network via gradient-free localization. In: IEEE winter conference on applications of computer vision, WACV 2020, Snowmass Village, CO, USA Dhurandhar A, Chen P, Luss R, et al (2018) Explanations based on the missing: towards contrastive explanations with pertinent negatives. In: Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, Montréal, Canada Doersch C (2016) Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908 Domingos PM (1998) Knowledge discovery via multiple models. Intell Data Anal 2(1–4):187–202 Donnelly J, Barnett AJ, Chen C (2022) Deformable protopnet: an interpretable image classifier using deformable prototypes. In: CVPR. IEEE, pp 10255–10265 Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 Došilović FK, Brčić M, Hlupić N (2018) Explainable artificial intelligence: a survey. In: 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO), IEEE, pp 0210–0215 ElShawi R, Sherif Y, Al-Mallah M, et al (2019) Ilime: local and global interpretable model-agnostic explainer of black-box decision. In: European conference on advances in databases and information systems. Springer, pp 53–68 Erion GG, Janizek JD, Sturmfels P, et al (2019) Learning explainable models using attribution priors. arXiv preprint arXiv:1906.10670 Fong R, Patrick M, Vedaldi A (2019) Understanding deep networks via extremal perturbations and smooth masks. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South) Freitas AA (2013) Comprehensible classification models: a position paper. SIGKDD Explor 15(1):1–10 Friedman J, Popescu BE (2008) Predictive learning via rule ensembles. Ann Appl Stat 2:916–954 Geler Z, Kurbalija V, Ivanovic M, et al (2020) Weighted KNN and constrained elastic distances for time-series classification. Expert Syst Appl Ghorbani A, Wexler J, Zou JY, et al (2019) Towards automatic concept-based explanations. In: Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, Vancouver, BC, Canada Gilpin LH, Bau D, Yuan BZ, et al (2018) Explaining explanations: an overview of interpretability of machine learning. In: 5th IEEE international conference on data science and advanced analytics, DSAA 2018, Turin, Italy Gleicher M (2016) A framework for considering comprehensibility in modeling. Big Data 4(2):75–88 Goebel R, Chander A, Holzinger K, et al (2018) Explainable AI: the new 42? In: Machine learning and knowledge extraction—second IFIP TC 5, TC 8/WG 8.4, 8.9, TC 12/WG 12.9 international cross-domain conference, CD-MAKE 2018, Hamburg, Germany, Proceedings Goyal Y, Shalit U, Kim B (2019) Explaining classifiers with causal concept effect (cace). arXiv preprint arXiv:1907.07165 Guidotti R (2021) Evaluating local explanation methods on ground truth. Artif Intell Guidotti R (2022) Counterfactual explanations and how to find them: literature review and benchmarking. DAMI, pp 1–55 Guidotti R, Monreale A, Giannotti F, et al (2019a) Factual and counterfactual explanations for black box decision making. IEEE Intell Syst Guidotti R, Monreale A, Matwin S, et al (2019b) Black box explanation by learning image exemplars in the latent feature space. In: Machine learning and knowledge discovery in databases—European conference, ECML PKDD 2019, Würzburg, Germany, proceedings, part I Guidotti R, Monreale A, Ruggieri S, et al (2019c) A survey of methods for explaining black box models. ACM Comput Surv Guidotti R, Monreale A, Matwin S, et al (2020a) Explaining image classifiers generating exemplars and counter-exemplars from latent representations. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, the tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA Guidotti R, Monreale A, Spinnato F, et al (2020b) Explaining any time series classifier. In: 2nd IEEE international conference on cognitive machine intelligence, CogMI 2020, Atlanta, GA, USA Gurumoorthy KS, Dhurandhar A, Cecchi GA, et al (2019) Efficient data representation by selecting prototypes with importance weights. In: 2019 IEEE international conference on data mining, ICDM 2019, Beijing, China Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn Hartmann Y, Liu H, Lahrberg S, et al (2022) Interpretable high-level features for human activity recognition. In: Proceedings of the 15th international joint conference on biomedical engineering systems and technologies, BIOSTEC 2022, vol. 4: BIOSIGNALS, Online Streaming Hase P, Bansal M (2020) Evaluating explainable AI: which algorithmic explanations help users predict model behavior? In: Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online Hastie TJ, Tibshirani RJ (1990) Generalized additive models, vol 43. CRC Press Hind M, Wei D, Campbell M, et al (2019) TED: teaching AI to explain its decisions. In: Proceedings of the 2019 AAAI/ACM conference on AI, ethics, and society, AIES 2019, Honolulu, HI, USA Hoover B, Strobelt H, Gehrmann S (2019) exbert: a visual analysis tool to explore learned representations in transformers models. arXiv preprint arXiv:1910.05276 Huang Q, Yamada M, Tian Y, et al (2020) Graphlime: local interpretable model explanations for graph neural networks. arXiv preprint arXiv:2001.06216 Hvilshøj F, Iosifidis A, Assent I (2021) ECINN: efficient counterfactuals from invertible neural networks. In: BMVC. BMVA Press, p 43 Jain S, Wallace BC (2019) Attention is not explanation. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, vol. 1 (long and short papers) Jeyakumar JV, Noor J, Cheng Y, et al (2020) How can I explain this to you? An empirical study of deep neural network explanation methods. In: Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, virtual Kamakshi V, Gupta U, Krishnan NC (2021) PACE: posthoc architecture-agnostic concept extractor for explaining CNNs. In: International joint conference on neural networks, IJCNN 2021, Shenzhen, China Kanamori K, Takagi T, Kobayashi K, et al (2020) DACE: distribution-aware counterfactual explanation by mixed-integer linear optimization. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI 2020 Kapishnikov A, Bolukbasi T, Viégas FB, et al (2019) XRAI: better attributions through regions. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South) Karimi A, Barthe G, Balle B, et al (2020a) Model-agnostic counterfactual explanations for consequential decisions. In: The 23rd international conference on artificial intelligence and statistics, AISTATS 2020, Online [Palermo, Sicily, Italy] Karimi A, Barthe G, Schölkopf B, et al (2020b) A survey of algorithmic recourse: definitions, formulations, solutions, and prospects. arXiv preprint arXiv:2010.04050 Katehakis Jr MN, Veinott AF (1987) The multi-armed bandit problem: decomposition and computation. Math Oper Res Kenny EM, Keane MT (2021) On generating plausible counterfactual and semi-factual explanations for deep learning. In: AAAI. AAAI Press, pp 11575–11585 Kim B, Chacha CM, Shah JA (2015) Inferring team task plans from human meetings: a generative modeling approach with logic-based prior. J Artif Intell Res Kim B, Koyejo O, Khanna R (2016) Examples are not enough, learn to criticize! criticism for interpretability. In: Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, Barcelona, Spain Kim B, Wattenberg M, Gilmer J, et al (2018) Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In: Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, conference track proceedings Koh PW, Liang P (2017) Understanding black-box predictions via influence functions. In: Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia Kurenkov A (2020) Lessons from the pulse model and discussion. The gradient Lakkaraju H, Bach SH, Leskovec J (2016) Interpretable decision sets: a joint framework for description and prediction. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA Lampridis O, Guidotti R, Ruggieri S (2020) Explaining sentiment classification with synthetic exemplars and counter-exemplars. In: Discovery science—23rd international conference, DS 2020, Thessaloniki, Greece, Proceedings Lang O, Gandelsman Y, Yarom M, et al (2021) Explaining in style: training a GAN to explain a classifier in stylespace. In: ICCV. IEEE, pp 673–682 Lapuschkin S, Wäldchen S, Binder A, et al (2019) Unmasking clever hans predictors and assessing what machines really learn. arXiv preprint arXiv:1902.10178 Lee Y, Wei C, Cheng T, et al (2012) Nearest-neighbor-based approach to time-series classification. Decis Support Syst Letham B, Rudin C, McCormick TH, et al (2015) Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. arXiv preprint arXiv:1511.01644 Ley D, Mishra S, Magazzeni D (2022) Global counterfactual explanations: investigations, implementations and improvements. In: ICLR 2022 workshop on PAIR\(\wedge \)2Struct: privacy, accountability, interpretability, robustness, reasoning on structured data. https://openreview.net/forum?id=Btbgp0dOWZ9 Li J, Monroe W, Jurafsky D (2016) Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220 Li O, Liu H, Chen C, et al (2018) Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In: Proceedings of the thirty-second AAAI conference on artificial intelligence, (AAAI-18), the 30th innovative applications of artificial intelligence (IAAI-18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI-18), New Orleans, Louisiana, USA Li H, Tian Y, Mueller K, et al (2019) Beyond saliency: understanding convolutional neural networks from saliency prediction on layer-wise relevance propagation. Image Vis Comput Lipovetsky S (2022) Explanatory model analysis: Explore, explain and examine predictive models, by Przemyslaw Biecek, Tomasz Burzykowski, Boca Raton, FL, Chapman and Hall/CRC, Taylor & Francis Group, 2021, xiii + 311 pp., \$ 79.96 (hbk), ISBN 978-0-367-13559-1. Technometrics Looveren AV, Klaise J (2021) Interpretable counterfactual explanations guided by prototypes. In: Machine learning and knowledge discovery in databases. Research track—European conference, ECML PKDD 2021, Bilbao, Spain, proceedings, part II Lucic A, Haned H, de Rijke M (2020) Why does my model fail?: Contrastive local explanations for retail forecasting. In: FAT* ’20: conference on fairness, accountability, and transparency, Barcelona, Spain Lucic A, ter Hoeve MA, Tolomei G, et al (2022) Cf-gnnexplainer: counterfactual explanations for graph neural networks. In: International conference on artificial intelligence and statistics, AISTATS 2022, virtual event Lundberg SM, Lee S (2017) A unified approach to interpreting model predictions. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, Long Beach, CA, USA Luss R, Chen P, Dhurandhar A, et al (2019) Generating contrastive explanations with monotonic attribute functions. arXiv preprint arXiv:1905.12698 Luss R, Chen P, Dhurandhar A, et al (2021) Leveraging latent features for local explanations. In: KDD ’21: the 27th ACM SIGKDD conference on knowledge discovery and data mining, virtual event, Singapore Madaan N, Padhi I, Panwar N, et al (2021) Generate your counterfactuals: towards controlled counterfactual generation for text. In: Thirty-fifth AAAI conference on artificial intelligence, AAAI 2021, thirty-third conference on innovative applications of artificial intelligence, IAAI 2021, the eleventh symposium on educational advances in artificial intelligence, EAAI 2021, virtual event Martens D, Provost FJ (2014) Explaining data-driven document classifications. MIS Q Martens D, Baesens B, Gestel TV, et al (2007) Comprehensible credit scoring models using rule extraction from support vector machines. Eur J Oper Res Miller T (2019) Explanation in artificial intelligence: insights from the social sciences. Artif Intell Ming Y, Qu H, Bertini E (2019) Rulematrix: visualizing and understanding classifiers with rules. IEEE Trans Vis Comput Graph Mollas I, Bassiliades N, Tsoumakas G (2019) Lionets: local interpretation of neural networks through penultimate layer decoding. In: Machine learning and knowledge discovery in databases—international workshops of ECML PKDD 2019, Würzburg, Germany, proceedings, part I Molnar C (2022) Model-agnostic interpretable machine learning. PhD thesis, Ludwig Maximilian University of Munich, Germany Mothilal RK, Sharma A, Tan C (2020) Explaining machine learning classifiers through diverse counterfactual explanations. In: FAT* ’20: conference on fairness, accountability, and transparency, Barcelona, Spain Muhammad MB, Yeasin M (2020) Eigen-cam: Class activation map using principal components. In: 2020 International joint conference on neural networks, IJCNN 2020, Glasgow, UK Murdoch WJ, Singh C, Kumbier K et al (2019) Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci 116(44):22071–22080 Nauta M, van Bree R, Seifert C (2021) Neural prototype trees for interpretable fine-grained image recognition. In: CVPR. Computer vision foundation/IEEE, pp 14933–14943 Nori H, Jenkins S, Koch P, et al (2019) Interpretml: a unified framework for machine learning interpretability. arXiv preprint arXiv:1909.09223 Pan D, Li X, Zhu D (2021) Explaining deep neural network models with adversarial gradient integration. In: Proceedings of the thirtieth international joint conference on artificial intelligence, IJCAI 2021, virtual event/Montreal, Canada Panigutti C, Perotti A, Pedreschi D (2020) Doctor XAI: an ontology-based approach to black-box sequential data classification explanations. In: FAT* ’20: conference on fairness, accountability, and transparency, Barcelona, Spain Panigutti C, Beretta A, Giannotti F, et al (2022) Understanding the impact of explanations on advice-taking: a user study for ai-based clinical decision support systems. In: CHI ’22: CHI conference on human factors in computing systems, New Orleans, LA, USA Pasquale F (2015) The black box society: the secret algorithms that control money and information. Harvard University Press Pawelczyk M, Broelemann K, Kasneci G (2020) Learning model-agnostic counterfactual explanations for tabular data. In: WWW ’20: the web conference 2020, Taipei, Taiwan Peltola T (2018) Local interpretable model-agnostic explanations of Bayesian predictive models via Kullback–Leibler projections. arXiv preprint arXiv:1810.02678 Petsiuk V, Das A, Saenko K (2018) RISE: randomized input sampling for explanation of black-box models. In: British machine vision conference 2018, BMVC 2018, Newcastle, UK Pezeshkpour P, Tian Y, Singh S (2019) Investigating robustness and interpretability of link prediction via adversarial modifications. In: 1st Conference on automated knowledge base construction, AKBC 2019, Amherst, MA, USA Plumb G, Molitor D, Talwalkar A (2018) Model agnostic supervised local explanations. In: Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, Montréal, Canada Poyiadzi R, Sokol K, Santos-Rodríguez R, et al (2020) FACE: feasible and actionable counterfactual explanations. In: AIES ’20: AAAI/ACM conference on AI, ethics, and society, New York, NY, USA Prado-Romero MA, Prenkaj B, Stilo G, et al (2022) A survey on graph counterfactual explanations: definitions, methods, evaluation. arXiv preprint arXiv:2210.12089 Puri I, Dhurandhar A, Pedapati T, et al (2021) Cofrnets: interpretable neural architecture inspired by continued fractions. In: Advances in neural information processing systems 34: annual conference on neural information processing systems 2021, NeurIPS 2021, virtual Rajani NF, McCann B, Xiong C, et al (2019) Explain yourself! Leveraging language models for commonsense reasoning. In: Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, Italy, vol 1: long papers Renard X, Woloszko N, Aigrain J, et al (2019) Concept tree: high-level representation of variables for more interpretable surrogate decision trees. arXiv preprint arXiv:1906.01297 Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA Ribeiro MT, Singh S, Guestrin C (2018) Anchors: High-precision model-agnostic explanations. In: Proceedings of the thirty-second AAAI conference on artificial intelligence, (AAAI-18), the 30th innovative applications of artificial intelligence (IAAI-18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI-18), New Orleans, Louisiana, USA Robnik-Šikonja M, Kononenko I (2008) Explaining classifications for individual instances. IEEE Trans Knowl Data Eng 20(5) Rojat T, Puget R, Filliat D, et al (2021) Explainable artificial intelligence (XAI) on timeseries data: a survey. arXiv preprint arXiv:2104.00950 Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell Samek W, Montavon G, Vedaldi A, et al (eds) (2019) Explainable AI: interpreting, explaining and visualizing deep learning, lecture notes in computer science, vol 11700. Springer Schwab P, Karlen W (2019) Cxplain: causal explanations for model interpretation under uncertainty. In: Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, Vancouver, BC, Canada Schwarzenberg R, Hübner M, Harbecke D, et al (2019) Layerwise relevance visualization in convolutional text graph classifiers. In: Proceedings of the thirteenth workshop on graph-based methods for natural language processing, TextGraphs@EMNLP 2019, Hong Kong Selvaraju RR, Cogswell M, Das A, et al (2020) Grad-cam: Visual explanations from deep networks via gradient-based localization. Int J Comput Vis Setzu M, Guidotti R, Monreale A, et al (2019) Global explanations with local scoring. In: Machine learning and knowledge discovery in databases—international workshops of ECML PKDD 2019, Würzburg, Germany, proceedings, part I Setzu M, Guidotti R, Monreale A, et al (2021) Glocalx—from local to global explanations of black box AI models. Artif Intell Shankaranarayana SM, Runje D (2019) ALIME: autoencoder based approach for local interpretability. In: Intelligent data engineering and automated learning—IDEAL 2019—20th international conference, Manchester, UK, proceedings, part I Shen W, Wei Z, Huang S, et al (2021) Interpretable compositional convolutional neural networks. In: Proceedings of the thirtieth international joint conference on artificial intelligence, IJCAI 2021, virtual event/Montreal, Canada Shi S, Zhang X, Fan W (2020) A modified perturbed sampling method for local interpretable model-agnostic explanation. arXiv preprint arXiv:2002.07434 Shrikumar A, Greenside P, Kundaje A (2017) Learning important features through propagating activation differences. In: Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, Conference track proceedings Smilkov D, Thorat N, Kim B, et al (2017) Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 Snyder H (2019) Literature review as a research methodology: an overview and guidelines. J Bus Res 104:333–339. https://doi.org/10.1016/j.jbusres.2019.07.039 Srivastava S, Labutov I, Mitchell TM (2017) Joint concept learning and semantic parsing from natural language explanations. In: Proceedings of the 2017 conference on empirical methods in natural language processing, EMNLP 2017, Copenhagen, Denmark Suissa-Peleg A, Haehn D, Knowles-Barley S, et al (2016) Automatic neural reconstruction from petavoxel of electron microscopy data. Microsc Microanal Sundararajan M, Taly A, Yan Q (2017) Axiomatic attribution for deep networks. In: Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia Tan S, Soloviev M, Hooker G, et al (2020) Tree space prototypes: another look at making tree ensembles interpretable. In: FODS ’20: ACM-IMS foundations of data science conference, virtual event, USA Theissler A (2017) Detecting known and unknown faults in automotive systems using ensemble-based anomaly detection. Knowl Based Syst Theissler A, Spinnato F, Schlegel U, et al (2022) Explainable AI for time series classification: a review, taxonomy and research directions. IEEE Access Tjoa E, Guan C (2019) A survey on explainable artificial intelligence (XAI): towards medical XAI. arXiv preprint arXiv:1907.07374 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, Long Beach, CA, USA Verma S, Dickerson JP, Hines K (2020) Counterfactual explanations for machine learning: a review. arXiv preprint arXiv:2010.10596 Vermeire T, Brughmans D, Goethals S et al (2022) Explainable image classification with evidence counterfactual. Pattern Anal Appl 25(2):315–335 Wachter S, Mittelstadt BD, Russell C (2017) Counterfactual explanations without opening the black box: automated decisions and the GDPR. arXiv preprint arXiv:1711.00399 Wang H, Wang Z, Du M, et al (2020) Score-cam: score-weighted visual explanations for convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR workshops 2020, Seattle, WA, USA Williams JJ, Kim J, Rafferty AN, et al (2016) AXIS: generating explanations at scale with learnersourcing and machine learning. In: Proceedings of the third ACM conference on learning @ Scale, L@S 2016, Edinburgh, Scotland, UK Wu Z, Ong DC (2021) Context-guided BERT for targeted aspect-based sentiment analysis. In: Thirty-fifth AAAI conference on artificial intelligence, AAAI 2021, thirty-third conference on innovative applications of artificial intelligence, IAAI 2021, the eleventh symposium on educational advances in artificial intelligence, EAAI 2021, virtual event Wu T, Ribeiro MT, Heer J, et al (2021a) Polyjuice: generating counterfactuals for explaining, evaluating, and improving models. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, ACL/IJCNLP 2021, (vol 1: long papers), virtual event Wu Z, Pan S, Chen F, et al (2021b) A comprehensive survey on graph neural networks. IEEE Trans Neural Networks Learn Syst Xu K, Ba J, Kiros R, et al (2015) Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the 32nd international conference on machine learning, ICML 2015, Lille, France Yang M, Kim B (2019) BIM: towards quantitative evaluation of interpretability methods with ground truth. arXiv preprint arXiv:1907.09701 Yang H, Rudin C, Seltzer MI (2017) Scalable Bayesian rule lists. In: Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia Yeh C, Kim B, Arik SÖ, et al (2020) On completeness-aware concept-based explanations in deep neural networks. In: Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, virtual Yuan H, Tang J, Hu X, et al (2020a) XGNN: towards model-level explanations of graph neural networks. In: KDD ’20: the 26th ACM SIGKDD conference on knowledge discovery and data mining, virtual event, CA, USA, August (2020) Yuan H, Yu H, Gui S, et al (2020b) Explainability in graph neural networks: a taxonomic survey. arXiv preprint arXiv:2012.15445 Yuan H, Yu H, Gui S, et al (2020c) Explainability in graph neural networks: a taxonomic survey. arXiv preprint arXiv:2012.15445 Zafar MR, Khan NM (2019) DLIME: a deterministic local interpretable model-agnostic explanations approach for computer-aided diagnosis systems. arXiv preprint arXiv:1906.10263 Zhang Y, Chen X (2020) Explainable recommendation: a survey and new perspectives. Found Trends Inf Retr Zhang H, Torres F, Sicre R, et al (2023) Opti-cam: optimizing saliency maps for interpretability. CoRR arXiv:2301.07002 Zhou Y, Hooker G (2016) Interpreting models via single tree approximation. arXiv preprint arXiv:1610.09036