Attention-like feature explanation for tabular data

Andrei V. Konstantinov1, Lev V. Utkin1
1Higher School of Artificial Intelligence, Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia

Tóm tắt

A new method for local and global explanation of the machine learning black-box model predictions by tabular data is proposed. It is implemented as a system called AFEX (Attention-like Feature EXplanation) and consisting of two main parts. The first part is a set of the one-feature neural subnetworks, which aim to get a specific representation for every feature in the form of a basis of shape functions. The subnetworks use shortcut connections with trainable parameters to improve the network training performance. The second part of AFEX produces shape functions of features as the weighted sum of the basis shape functions where weights are computed by using an attention-like mechanism. The most important advantage of AFEX is that it identifies pairwise interactions between features based on pairwise multiplications of shape functions corresponding to different features. A modification of AFEX with incorporating an additional surrogate model, which approximates the black-box model, is proposed. AFEX is trained end-to-end on a whole dataset only once such that it does not require to train neural networks again in the explanation stage. Numerical experiments with synthetic and real data illustrate AFEX. The corresponding code implementing the method is publicly available.

Tài liệu tham khảo

Holzinger, A., Langs, G., Denk, H., Zatloukal, K., Muller, H.: Causability and explainability of artificial intelligence in medicine. WIREs Data Min. Knowl. Discov. 9(4), 1312 (2019) Arya, V., Bellamy, R.K.E., Chen, P.-Y., Dhurandhar, A., Hind, M., Hoffman, S.C., Houde, S., Liao, Q.V., Luss, R., Mojsilovic, A., Mourad, S., Pedemonte, P., Raghavendra, R., Richards, J., Sattigeri, P., Shanmugam, K., Singh, M., Varshney, K.R., Wei, D., Zhang, Y.: One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques. arXiv:1909.03012 (2019) Belle, V., Papantonis, I.: Principles and Practice of Explainable Machine Learning. arXiv:2009.11698 (2020) Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 93 (2019) Liang, Y., Li, S., Yan, C., Li, M., Jiang, C.: Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing 419, 168–182 (2021) Molnar, C.: Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Published online, https://christophm.github.io/interpretable-ml-book/ (2019) Murdoch, W.J., Singh, C., Kumbier, K., Abbasi-Asl, R., Yua, B.: Interpretable Machine Learning: Definitions, Methods, and Applications. arXiv:1901.04592 (2019) Xie, N., Ras, G., van Gerven, M., Doran, D.: Explainable Deep Learning: A Field Guide for the Uninitiated. arXiv:2004.14545 (2020) Zablocki, E., Ben-Younes, H., Perez, P., Cord, M.: Explainability of Vision-Based Autonomous Driving Systems: Review and Challenges. arXiv:2101.05307 (2021) Zhang, Y., Tino, P., Leonardis, A., Tang, K.: A Survey on Neural Network Interpretability. arXiv:2012.14261 (2020) Ribeiro, M.T., Singh, S., Guestrin, C.: “Why Should I Trust You?” Explaining the Predictions of Any Classifier. arXiv:1602.04938v3 (2016) Poyiadzi, R., Renard, X., Laugel, T., Santos-Rodriguez, R., Detyniecki, M.: Understanding Surrogate Explanations: The Interplay Between Complexity, Fidelity and Coverage. arXiv:2107.04309 (2021) Hastie, T., Tibshirani, R.: Generalized Additive Models, vol. 43. CRC Press, Boca Raton (1990) Nori, H., Jenkins, S., Koch, P., Caruana, R.: InterpretML: A Unified Framework for Machine Learning Interpretability. arXiv:1909.09223 (2019) Agarwal, R., Frosst, N., Zhang, X., Caruana, R., Hinton, G.E.: Neural Additive Models: Interpretable Machine Learning with Neural Nets. arXiv:2004.13912 (2020) Yang, Z., Zhang, A., Sudjianto, A.: GAMI-Net: An Explainable Neural Networkbased on Generalized Additive Models with Structured Interactions. arXiv:2003.07132 (2020) Chen, J., Vaughan, J., Nair, V.N., Sudjianto, A.: Adaptive Explainable Neural Networks (AxNNs). arXiv:2004.02353v2 (2020) Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30, 4765–4774 (2017) Strumbelj, E., Kononenko, I.: An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11, 1–18 (2010) Nadaraya, E.A.: On estimating regression. Theory Probabil. Applic. 9(1), 141–142 (1964) Watson, G.S.: Smooth regression analysis. Sankhya: The Indian Journal of Statistics, Series A, 359–372 (1964) Niu, Z., Zhong, G., Yu, H.: A review on the attention mechanism of deep learning. Neurocomputing 452, 48–62 (2021) Chaudhari, S., Mithal, V., Polatkan, G., Ramanath, R.: An attentive Survey of Attention Models. arXiv:1904.02874 (2019) Lin, T., Wang, Y., Liu, X., Qiu, X.: A Survey of Transformers. arXiv:2106.04554 (2021) Hickmann, M.L., Wurzberger, F., Lochner, M.H.A., Töllich, J., Scherp, A.: Analysis of GraphSum’s Attention Weights to Improve the Explainability of Multi-Document Summarization. arXiv:2105.11908 (2021) Li, L., Zhang, Y., Chen, L.: Personalized transformer for explainable recommendation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL 2021), Bangkok, Thailand, pp. 1–11 (2021) Patro, B.N., Anupriy, Namboodiri, V.P.: Explanation vs attention: A two-player game to obtain attention for vqa. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), pp. 11848–11855. Association for the Advancement of Artificial Intelligence (2020) Rojat, T., Puget, R., Filliat, D., Ser, J.D., Gelin, R., Diaz-Rodriguez, N.: Explainable Artificial Intelligence (XAI) on Time Series Data: A Survey. arXiv:2104.00950 (2021) Skrlj, B., Dzeroski, S., Lavrac, N., Petkovic, M.: Feature Importance Estimation with Self-Attention Networks. arXiv:2002.04464 (2020) Wiegreffe, S., Pinter, Y.: Attention is not not Explanation. arXiv:1908.04626 (2019) Jain, S., Wallace, B.C.: Attention is not Explanation. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 3543–3556 (2019) Serrano, S., Smith, N.A.: Is attention interpretable? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2931–2951. Association for Computational Linguistics (2019) Chang, C.-H., Caruana, R., Goldenberg, A.: NODE-GAM: Neural Generalized Additive Model for Interpretable Deep Learning. arXiv:2106.01613 (2021) O’Neill, L., Angus, S., Borgohain, S., Chmait, N., Dowe, D.L.: Creating Powerful and Interpretable Models with Regression Networks. arXiv:2107.14417 (2021) Shankaranarayana, S.M., Runje, D.: ALIME: Autoencoder Based Approach for Local Interpretability. arXiv:1909.02437 (2019) Rabold, J., Deininger, H., Siebers, M., Schmid, U.: Enriching Visual with Verbal Explanations for Relational Concepts: Combining LIME with Aleph. arXiv:1910.01837v1 (2019) Kovalev, M.S., Utkin, L.V., Kasimov, E.M.: SurvLIME: a method for explaining machine learning survival models. Knowl.-Based Syst. 203, 106164 (2020) Garreau, D., von Luxburg, U.: Explaining the Explainer: A First Theoretical Analysis of LIME. arXiv:2001.03447 (2020) Huang, Q., Yamada, M., Tian, Y., Singh, D., Yin, D., Chang, Y.: GraphLIME: Local Interpretable Model Explanations for Graph Neural Networks. arXiv:2001.06216 (2020) Fong, R., Vedaldi, A.: Explanations for attributing deep neural network predictions. In: Explainable AI, vol. 11700, pp. 149–167. Springer, Cham (2019) Vu, M.N., Nguyen, T.D., Phan, N., R. Gera, M.T.T.: Evaluating Explainers via Perturbation. arXiv:1906.02032v1 (2019) Du, M., Liu, N., Hu, X.: Techniques for Interpretable Machine Learning. arXiv:1808.00033 (2019) Jethani, N., Sudarshan, M., Covert, I., Lee, S.-I., Ranganath, R.: FastSHAP: Real-Time Shapley Value Estimation. arXiv:2107.07436 (2021) Ghalebikesabi, S., Ter-Minassian, L., Diaz-Ordaz, K., Holmes, C.: On Locality of Local Explanation Models. arXiv:2106.14648 (2021) Benard, C., Biau, G., Veiga, S.D., Scornet, E.: SHAFF: Fast and consistent SHApley eFfect estimates via random Forests. arXiv:2105.11724 (2021) Bouneder, L., Leo, Y., Lachapelle, A.: X-SHAP: Towards Multiplicative Explainability of Machine Learning. arXiv:2006.04574 (2020) Wang, R., Wang, X., Inouye, D.I.: Shapley Explanation Networks. arXiv:2104.02297 (2021) Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening the black box: automated decisions and the GPDR. Harvard J. Law Technol. 31, 841–887 (2017) Hendricks, L.A., Hu, R., Darrell, T., Akata, Z.: Grounding visual explanations. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 264–279 (2018) Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018) Arrieta, A.B., Diaz-Rodriguez, N., Ser, J.D., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., Herrera, F.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion 58, 82–115 (2020) Bodria, F., Giannotti, F., Guidotti, R., Naretto, F., Pedreschi, D., Rinzivillo, S.: Benchmarking and Survey of Explanation Methods for Black Box Models. arXiv:2102.13076 (2021) Carvalho, D.V., Pereira, E.M., Cardoso, J.S.: Machine learning interpretability: a survey on methods and metrics. Electronics 8(832), 1–34 (2019) Li, X., Xiong, H., Li, X., Wu, X., Zhang, X., Liu, J., Bian, J., Dou, D.: Interpretable Deep Learning: Interpretations, Interpretability, Trustworthiness, and Beyond. arXiv:2103.10689 (2021) Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019) Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., Zhong, C.: Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges. arXiv:2103.11251 (2021) Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001) Konstantinov, A.V., Utkin, L.V.: Interpretable machine learning with an ensemble of gradient boosting machines. Knowl.-Based Syst. 222(106993), 1–16 (2021) Lou, Y., Caruana, R., Gehrke, J.: Intelligible models for classification and regression. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 150–158. ACM (2012) Zhang, X., Tan, S., Koch, P., Lou, Y., Chajewska, U., Caruana, R.: Axiomatic interpretability for multiclass additive models. In: In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 226–234. ACM (2019) Chang, C.-H., Tan, S., Lengerich, B., Goldenberg, A., Caruana, R.: How Interpretable and Trustworthy are GAMs? arXiv:2006.06466 (2020) Nori, H., Caruana, R., Bu, Z., Shen, J.H., Kulkarni, J.: Accuracy, Interpretability, and Differential Privacy via Explainable Boosting. arXiv:2106.09680 (2021) Guo, Y., Su, Y., Yang, Z., Zhang, A.: Explainable Recommendation Systems by Generalized Additive Models with Manifest and Latent Interactions. arXiv:2012.08196 (2020) Popov, S., Morozov, S., Babenko, A.: Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data. arXiv:1909.06312 (2019) Utkin, L.V., Satyukov, E.D., Konstantinov, A.V.: SurvNAM: The Machine Learning Survival Model Explanation. arXiv:2104.08903 (2021) Zhang, A., Lipton, Z.C., Li, M., Smola, A.J.: Dive into Deep Learning. arXiv:2106.11342 (2021) Bahdanau, D., Cho, K., Bengio, Y.: Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:1409.0473 (2014) Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421. The Association for Computational Linguistics (2015) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, 2017, pp. 5998–6008 (2017) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) Graves, A., Wayne, G., Danihelka, I.: Neural Turing Machines. arXiv:1410.5401 (2014) Dodge, Y.: The Concise Encyclopedia of Statistics. Springer, Cham (2008) Boulesteix, A.-L., Janitza, S., Hapfelmeier, A., Steen, K.V., Strobl, C.: Letter to the editor: on the term ‘interaction’ and related phrases in the literature on random forests. Brief. Bioinform. 16(2), 338–345 (2014)