Explainable artificial intelligence (XAI) post-hoc explainability methods: risks and limitations in non-discrimination law

Daniel Vale1, Ali El-Sharif2, M. Syed Ali3
1Leiden University
2College of Computing and Engineering, Nova Southeastern University, Fort Lauderdale, USA
3UCL Knowledge Lab, University College London, London, UK

Tóm tắt

Từ khóa


Tài liệu tham khảo

Adadi, A., Berrada, M.: Peeking inside the Black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/access.2018.2870052

Ahmad, M.A., Eckert, C., Teredesai, A.:. Interpretable machine learning in healthcare. In: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. https://doi.org/10.1145/3233547.3233667 (2018)

Alom, M., Taha, T., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M., Asari, V., et al.: The history began from alexnet: a comprehensive survey on deep learning approaches. https://arxiv.org/abs/1803.01164 (2018)

Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016, May 23). Machine Bias. Retrieved from ProPublica: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., Muller, K.: How to explain individual classification decisions. J Mach Learn Res 1803–1831. https://dl.acm.org/doi/pdf/10.5555/1756006.1859912 (2010)

Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., Herrera, F.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion 58, 82–115 (2020). https://doi.org/10.1016/j.inffus.2019.12.012

Bodria, F., Giannotti, F., Guidotti, R., Naretto, F., Pedreschi, D., Rinzivillo, S.:. Benchmarking and survey of explanation methods for black box models. https://arxiv.org/pdf/2102.13076.pdf (2021)

Bratko, I.: Machine learning: between accuracy and interpretability. In: Della Riccia, G., Lenz, H.-J., Kruse, R. (eds.) Learning, Networks and Statistics. ICMS, vol. 382, pp. 163–177. Springer, Vienda (1997)

Breiman, L.: Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci (2001). https://doi.org/10.1214/ss/1009213726

Burkart, N., Huber, M.F.: A survey on the explainability of supervised machine learning. J Artif Intell Res 70, 245–317 (2021). https://doi.org/10.1613/jair.1.12228

Burrell, J.: How the machine ‘thinks’: understanding opacity in machine learning algorithms. Big Data Soc. 3(1), 205395171562251 (2016). https://doi.org/10.1177/2053951715622512

Camburu, O., Giunchiglia, E., Foerster, J., Lukasiewicz, T., Blunsom, P.: Can I trust the explainer? Verifying post-hoc explanatory methods. https://arxiv.org/abs/1910.02065 (2019)

Carvalho, D.V., Pereira, E.M., Cardoso, J.S.: Machine learning Interpretability: a survey on methods and metrics. Electronics 8(8), 832 (2019). https://doi.org/10.3390/electronics8080832

Choi, E., Bahadori, M., Kulas, J., Schuetz, A., Stewart, W., Sun, J.: RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism. Adv Neural Inf Process Syst 3504–3512 (2016). https://arxiv.org/abs/1608.05745

Council of Europe: European Court of Human Rights: Handbook on European non-discrimination law. Council of Europe: European Court of Human Rights, Strasburg (2018)

Covert, I., Lundberg, S., Lee, S.: Explaining by removing: a unified framework for model explanation. https://arxiv.org/abs/2011.14878 (2020)

Cranor, L.: A framework for reasoning about the human in the loop. https://www.usenix.org/legacy/event/upsec/tech/full_papers/cranor/cranor.pdf (2008)

Deng, J., Dong, W., Socher, R., Li, L., Kai, L., Li, F.-F.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2009.5206848 (2009)

Douglas-Scott, S.: The European Union and human rights after the treaty of Lisbon. Hum. Rights Law Rev. 11(4), 645–682 (2011). https://doi.org/10.1093/hrlr/ngr038

Doyle, O.: Direct discrimination, indirect discrimination and autonomy. Oxf. J. Leg. Stud. 27(3), 537–553 (2007). https://doi.org/10.1093/ojls/gqm008

Du, M., Liu, N., Hu, X.: Techniques for interpretable machine learning. Commun. ACM 63(1), 68–77 (2019). https://doi.org/10.1145/3359786

Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference on-ITCS '12. https://doi.org/10.1145/2090236.2090255 (2012)

Dwork, C., Immorlica, N., Kalai, A.T., Leiserson, M.: Decoupled classifiers for fair and efficient machine learning. https://arxiv.org/abs/1707.06613 (2017)

Ellis, E., Watson, P.: Key concepts in EU anti-discrimination law. EU Anti-Discrimination Law (2012). https://doi.org/10.1093/acprof:oso/9780199698462.003.0004

Ernst, C.: Artificial intelligence and autonomy: self-determination in the age of automated systems. Regulat. Artif. Intell. (2019). https://doi.org/10.1007/978-3-030-32361-5_3

Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/2783258.2783311 (2015)

Floridi, L., Chiriatti, M.: GPT-3: its nature, scope, limits, and consequences. Mind. Mach. 30(4), 681–694 (2020). https://doi.org/10.1007/s11023-020-09548-1

Foster, K.R., Koprowski, R., Skufca, J.D.: Machine learning, medical diagnosis, and biomedical engineering research - commentary. Biomed. Eng. Online 13(1), 94 (2014). https://doi.org/10.1186/1475-925x-13-94

Gerards, J., Xenidis, R.: Algorithmic discrimination in Europe: Challenges and opportunities for gender equality and non-discrimination law. Publications Office of the European Union (2021)

Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). https://doi.org/10.1109/dsaa.2018.00018 (2018)

Girasa, R.: AI US policies and regulations. Intell. Disrupt. Technol Artif (2020). https://doi.org/10.1007/978-3-030-35975-1_3

Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 1–42 (2019). https://doi.org/10.1145/3236009

Guiraudon, V.: Equality in the making: implementing European non-discrimination law. Citizsh. Stud. 13(5), 527–549 (2009). https://doi.org/10.1080/13621020903174696

Hall, P., Gill, N., Schmidt, P.: Proposed guidelines for the responsible use of explainable machine learning. https://arxiv.org/abs/1906.03533 (2019)

Hall, P., Gill, N., Kurka, M., Phan, W.: Machine learning interpretability with H20 driverless AI. Mountain View: H20. https://www.h2o.ai/wp-content/uploads/2017/09/MLI.pdf (2017)

Hand, D.J.: Classifier technology and the illusion of progress. Stat. Sci. (2006). https://doi.org/10.1214/088342306000000060

Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems. https://arxiv.org/abs/1610.02413 (2016)

Kantola, J., Nousiainen, K.: The European Union: Initiator of a New European Anti-Discrimination Regime? In: Krizsan A, Skjeie H, Squires J (eds) Institutionalizing Intersectionality: The Changing Nature of European Equality Regimes. Palgrave Macmillan (2012)

Larson, J., Mattu, S., Kirchner, L., Angwin, J.: How we analyzed the COMPAS recidivism algorithm. ProPublica 1–16 (2016). https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm

Laugel, T., Lesot, M., Marsala, C., Renard, X., Detyniecki, M.: The dangers of post-hoc interpretability: unjustified counterfactual explanations. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/388 (2019)

Lundberg, S.M., Erion, G., Chen, H., DeGrave, A., Prutkin, J.M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., Lee, S.: From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2(1), 56–67 (2020). https://doi.org/10.1038/s42256-019-0138-9

Mair, J.: Direct discrimination: limited by definition? Int. J. Discrim. Law 10(1), 3–17 (2009). https://doi.org/10.1177/135822910901000102

Maliszewska-Nienartowicz, J.: Direct and indirect discrimination in European Union Law—how to draw a dividing line? Int. J. Soc. Sci. 41–55 (2014). https://www.iises.net/download/Soubory/soubory-puvodni/pp041-055_ijoss_2014v3n1.pdf

Molnar, C., Konig, G., Herbinger, J., Freiesleben, T., Dandl, S., Scholbeck, C. A., Casalicchio G., Grosse-Wentrup M., Bischl, B.: Pitfalls to avoid when interpreting machine learning models. https://arxiv.org/abs/2007.04131 (2020)

Meske, C., Bunde, E.: Transparency and trust in human–AI-interaction: the role of model-agnostic explanations in computer vision-based decision support. Artif. Intell. HCI (2020). https://doi.org/10.1007/978-3-030-50334-5_4

Montavon, G., Lapuschkin, S., Binder, A., Samek, W., Müller, K.: Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recogn. 65, 211–222 (2017). https://doi.org/10.1016/j.patcog.2016.11.008

Murdoch, W.J., Singh, C., Kumbier, K., Abbasi-Asl, R., Yu, B.: Definitions, methods, and applications in interpretable machine learning. Proc. Natl. Acad. Sci. 116(44), 22071–22080 (2019). https://doi.org/10.1073/pnas.1900654116

Narayanan, A.: Translation tutorial: 21 fairness definitions and their politics. In: Proceedings of the Conference on. Fairness Accountability Transparency. https://fairmlbook.org/tutorial2.html (2018)

Nie, L., Wang, M., Zhang, L., Yan, S., Zhang, B., Chua, T.: Disease inference from health-related questions via sparse deep learning. IEEE Trans. Knowl. Data Eng. 27(8), 2107–2119 (2015). https://doi.org/10.1109/tkde.2015.2399298

Onishi, T., Saha, S.K., Delgado-Montero, A., Ludwig, D.R., Onishi, T., Schelbert, E.B., Schwartzman, D., Gorcsan, J.: Global longitudinal strain and global circumferential strain by speckle-tracking echocardiography and feature-tracking cardiac magnetic resonance imaging: comparison with left ventricular ejection fraction. J. Am. Soc. Echocardiogr. 28(5), 587–596 (2015). https://doi.org/10.1016/j.echo.2014.11.018

O’Sullivan, S., Nevejans, N., Allen, C., Blyth, A., Leonard, S., Pagallo, U., Holzinger, K., Holzinger, A., Sajid, M.I., Ashrafian, H.: Legal, regulatory, and ethical frameworks for development of standards in artificial intelligence (AI) and autonomous robotic surgery. Int. J. Med. Robot. Comput. Assist. Surg. 15(1), e1968 (2019). https://doi.org/10.1002/rcs.1968

Qian, K., Danilevsky, M., Katsis, Y., Kawas, B., Oduor, E., Popa, L., Li, Y.: XNLP: A living survey for XAI research in natural language processing. In: 26th International Conference on Intelligent User Interfaces. https://doi.org/10.1145/3397482.3450728 (2021)

Pasquale, F.: The black box society, the secret algorithms that control money and information. Cambridge, MA: Harvard University Press. https://doi.org/10.4159/harvard.9780674736061 (2015)

Pasquale, F.: Toward a fourth law of robotics: preserving attribution, responsibility, and explainability in an algorithmic society. Ohio State Law J. https://ssrn.com/abstract=3002546 (2017)

Pedreschi, D., Giannotti, F., Guidotti, R., Monreale, A., Ruggieri, S., Turini, F.: Meaningful explanations of black box AI decision systems. Proc. AAAI Conf. Artif. Intell. 33, 9780–9784 (2019). https://doi.org/10.1609/aaai.v33i01.33019780

Ribeiro, M., Singh, S., Guestrin, C.: Model-agnostic interpretability of machine learning. https://arxiv.org/abs/1606.05386 (2016)

Ribeiro, M., Singh, S., Guestrin, C.: Why should i trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/2939672.2939778 (2016)

Ringelheim, J.: The burden of proof in antidiscrimination proceedings. A focus on Belgium, France and Ireland. Eur. Equal. Law Rev. (2019). https://ssrn.com/abstract=3498346

Rissland, E.: AI and legal reasoning. In: Proceedings of the 9th International Joint Conference on Artificial Intelligence. https://dl.acm.org/doi/abs/10.5555/1623611.1623724 (1985)

Rissland, E.L., Ashley, K.D., Loui, R.: AI and law: a fruitful synergy. Artif. Intell. 150(1–2), 1–15 (2003). https://doi.org/10.1016/s0004-3702(03)00122-x

Robnik-Šikonja, M., Bohanec, M.: Perturbation-based explanations of prediction models. Hum. Mach. Learn. (2018). https://doi.org/10.1007/978-3-319-90403-0_9

Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1(5), 206–215 (2019). https://doi.org/10.1038/s42256-019-0048-x

Samek, W., Montavon, G., Lapuschkin, S., Anders, C.J., Müller, K.R.: Explaining deep neural networks and beyond: a review of methods and applications. Proc. IEEE 109(3), 247–278 (2021). https://doi.org/10.1109/JPROC.2021.3060483

Schwab, P., Karlen, W.: CXPlain: causal explanations for model interpretation under uncertainty. In: Advances in Neural Information Processing Systems. https://arxiv.org/abs/1910.12336 (2019)

Selbst, A.D., Barocas, S.: The intuitive appeal of explainable machines. SSRN Electron. J. (2018). https://doi.org/10.2139/ssrn.3126971

Suresh, H., Gong, J.J., Guttag, J.V.: Learning tasks for multitask learning. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/3219819.3219930 (2018)

Suresh, H., Guttag, J.: A framework for understanding unintended consequences of machine learning. https://arxiv.org/abs/1901.10002 (2019)

Tan, S., Caruana, R., Hooker, G., Lou, Y.: Distill-and-compare. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. https://doi.org/10.1145/3278721.3278725 (2018)

Tischbirek, A.: Artificial intelligence and discrimination: discriminating against discriminatory systems. Regul. Artif. Intell. (2019). https://doi.org/10.1007/978-3-030-32361-5_5

VanderWeele, T.J., Hernan, M.A.: Results on differential and dependent measurement error of the exposure and the outcome using signed directed acyclic graphs. Am. J. Epidemiol. 175(12), 1303–1310 (2012). https://doi.org/10.1093/aje/kwr458

Verma, S., Rubin, J.: Fairness definitions explained. Proc. Int. Workshop Softw. Fairness (2018). https://doi.org/10.1145/3194770.3194776

Viljoen, S.: Democratic data: a relational theory for data governance. SSRN Electron. J. (2020). https://doi.org/10.2139/ssrn.3727562

Visani, G., Bagli, E., Chesani, F., Poluzzi, A., Capuzzo, D.: Statistical stability indices for LIME: obtaining reliable explanations for machine learning models. J. Oper. Res. Soc. (2021). https://doi.org/10.1080/01605682.2020.1865846

Wachter, S., Mittelstadt, B., Russell, C.: Why fairness cannot be automated: bridging the gap between EU non-discrimination law and AI. SSRN Electron. J. (2020). https://doi.org/10.2139/ssrn.3547922

Wachter, S., Mittelstadt, B., Russell, C.: Bias preservation in machine learning: the legality of fairness metrics under EU non-discrimination law. SSRN Electron. J. (2021). https://doi.org/10.2139/ssrn.3792772

Wang, W., Siau, K., Keng, S.: Artificial intelligence: a study on governance, policies, and regulations. Association for Information Systems AIS Electronic Library. http://aisel.aisnet.org/mwais2018/40 (2018)

Wischmeyer, T.: Artificial intelligence and transparency: opening the black box. Regul. Artif. Intell. (2019). https://doi.org/10.1007/978-3-030-32361-5_4

Wischmeyer, T., Rademacher, T.: Regulating Artificial Intelligence. International Springer Publications, New York City (2020). https://doi.org/10.1007/978-3-030-32361-5

Zafar, M., Khan, N.: DLIME: a deterministic local interpretable model-agnostic explaination approach for computer-aided diagnosis systems. https://arxiv.org/abs/1906.10263 (2019)

Zemel, R., Wu, Y., Swersky, K., Pitassi, T., Dwork, C.: Learning fair representations. In: International Conference on Machine Learning, pp. 325–333. PMLR. https://proceedings.mlr.press/v28/zemel13.html (2013)

Zhang, Y., Song, S., Sun, Y., Tan, S., Udell, M.: "Why Should You Trust My Explaination?" Understanding uncertainty in LIME explanations. https://arxiv.org/abs/1904.12991 (2019)

Zuiderveen Borgesius, F.J.: Strengthening legal protection against discrimination by algorithms and artificial intelligence. Int. J. Hum. Rights 24(10), 1572–1593 (2020). https://doi.org/10.1080/13642987.2020.1743976

Zuiderveen Borgesius, F.J.: Discrimination, artificial intelligence, and algorithmic decision-making. Council of Europe, Directorate General of Democracy. https://rm.coe.int/discrimination-artificial-intelligence-and-algorithmic-decision-making/1680925d73 (2018)