Wasserstein-based fairness interpretability framework for machine learning models
Tóm tắt
The objective of this article is to introduce a fairness interpretability framework for measuring and explaining the bias in classification and regression models at the level of a distribution. In our work, we measure the model bias across sub-population distributions in the model output using the Wasserstein metric. To properly quantify the contributions of predictors, we take into account favorability of both the model and predictors with respect to the non-protected class. The quantification is accomplished by the use of transport theory, which gives rise to the decomposition of the model bias and bias explanations to positive and negative contributions. To gain more insight into the role of favorability and allow for additivity of bias explanations, we adapt techniques from cooperative game theory.
Tài liệu tham khảo
Aas, K., Jullum, M., & Løland, A., (2021). Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artificial Intelligence, 298.
Barocas, S., Hardt, M., Narayanan, A., Fairness and Machine Learning: Limitations and Opportunities. Available at: https://fairmlbook.org/.
Birrell, J., Dupuis, P., Katsoulakis, M. A., Pantazis, Y., & Rey-Bellet, L. (2022). \((f,\Gamma )\)-Divergences: Interpolating between \(f\)-Divergences and Integral Probability Metrics. Journal of Machine Learning Research, 23, 1–70.
Bousquet, O., & Elisseeff, A. (2002). Stability and generalization. Journal of Machine Learning Research, 2, 499–526.
Chen, H., Danizek, J., Lundberg, S., Lee, S.-I. (2020). True to the Model or True to the Data. arXiv preprint arXiv:2006.1623v1.
Chen, J., Kallus, N., Mao, X., Svacha, G., Udell, M. (2019). Fairness Under Unawareness: Assessing Disparity When Protected Class Is Unobserved, FAT* ’19: Proceedings of the conference on fairness, accountability, and transparency, p. 339-348, https://doi.org/10.1145/3287560.3287594.
Dheeru, D., & Taniskidou, E.K. UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences, (2017), https://archive.ics.uci.edu/ml/datas ets/.
Dudley, R.M., (1976). Probabilities and Metrics Lecture Notes Series (Vol. 45). Matematisk Institut: Aarhus University, Aarhus.
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R.S. (2012). Fairness through awareness. In Proc. ACM ITCS, 214-226.
Elliott, M. N., Morrison, P. A., Fremont, A., McCaffrey, D. F., Pantoja, P., & Lurie, N. (2009). Using the census bureau’s surname list to improve estimates of race/ethnicity and associated disparities health services and outcomes research. Methodology, 9(2), 69–83.
Equal Credit Opportunity Act (1974). https://www.fdic.gov/regulations/laws/rules/6000-1200.html.
Equal employment opportunity act, (1972). https://www.dol.gov/sites/dolgov/files/ofccp/regs/compliance/posters/pdf/eeopost.pdf.
Fair housing Act, (1968). https://www.fdic.gov/regulations/laws/rules/2000-6000.html.
Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., & Venkatasubramanian, S. (2015). Certifying and removing disparate impact. In Proc. 21st ACM SIGKDD, 259-268.
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.
Gordaliza, P., Barrio, E.D., Fabrice, G., Loubes J.-M. (2019). Obtaining Fairness using Optimal Transport Theory, Proceedings of the 36th International conference on machine learning, PMLR 97:2357-2365.
Hall, P., Cox, B., Dickerson, S., Ravi Kannan, A., Kulkarni, R., & Schmidt, N. (2021). United States fair lending perspective on machine learning. Frontiers in Artificial Intelligence. https://doi.org/10.3389/frai.2021.695301
Hardt, M., Price, E., & Srebro, N. (2015). Equality of opportunity in supervised learning. In Advances in neural information processing systems, 3315-3323.
Hastie, T., Tibshirani, R., & Friedman, J. (2016). The elements of statistical learning (2nd ed.). Springer.
Janzing, D., Minorics, L., Blöbaum, P. (2019). Feature relevance quantification in explainable AI: A causal problem. arXiv preprint arXiv:1910.13413v2.
Jiang, H., Nachum, O. (2020). Identifying and Correcting Label Bias in Machine Learning. Proceedings of the 23-rd International conference on artificial intelligence and statistics (AISTATS).
Kamishima, T., Akaho, S., Asoh, H., & Sakuma J. (2012). Fairness-Aware Classifier with Prejudice Remover Regularizer, Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECMLPKDD), Part II, pp.35-50.
Kamiran F., & Calders, T. (2009). Classifying without discriminating, 2009 2nd International conference on computer, control and communication, Karachi, pp. 1-6, https://doi.org/10.1109/IC4.2009.4909197,(2009).
Kantorovich, L. V., & Rubinstein, G. (1958). On a space of completely additive functions. Vestnik Leningradskogo Universiteta, 13(7), 52–59.
Kearns, M., & Ron, D. (1999). Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Computation, 11(6), 1427–1453.
Lipovetsky, S., & Conklin, M. (2001). Analysis of regression in game theory approach. Analysis of regression in game theory approach, 17, 319–330.
Lundberg S.M., Erion G.G., & Lee S.-I. (2019). Consistent individualized feature attribution for tree ensembles, arXiv preprint arxiv:1802.03888.
Lundberg, S.M., & Lee S.-I. (2017). A unified approach to interpreting model predictions, 31st Conference on neural information processing systems.
Miroshnikov, A., Kotsiopoulos, K., Franks, R., Ravi Kannan, A. (2021). Model-agnostic bias mitigation methods with regressor distribution control for Wasserstein-based fairness metrics. arXiv preprint arxiv:2111.11259.
Miroshnikov, A., Kotsiopoulos, K., Ravi Kannan, A., (2021). Mutual information-based group explainers with coalition structure for machine learning model explanations. arXiv preprint arxiv:2102.10878.
Müller, A. (1997). Integral probability metrics and their generating classes of functions. Advances in Applied Probability, 29(2), 429–443.
Santambrogio, F. (2015). Optimal transport for applied mathematicians. Basel: Birkäuser Springer.
Schmidt, N., Curtis, J., Siskin, B., & Stocks, C. (2021). Methods for Mitigation of Algorithmic Bias Discrimination, Proxy Discrimination, and Disparate Impact. U.S. Provisional Patent 63/153,692.
Shapley, L. S. (1953). A value for n-person games. Annals of Mathematics Studies, 28, 307–317.
Shiryaev, A. (1980). Probability. Springer.
Shorack, G. R., & Wellner, J. A. (1986). Empirical processes with applications to statistics. New York: Wiley.
Sriperumbudur, B.K., Fukumizu, K., Gretton, A., Schölkopf, B., & Lanckriet, G.R.G. (2009). On integral probability metrics, \(\phi\)-divergences and binary classification, arXiv preprint arxiv:0901.2698.
Štrumbelj, E., & Kononenko, I. (2010). An efficient explanation of individual classifications using game theory. The Journal of Machine Learning Research, 11, 1–18.
Sundararajan, M., Najmi, A. (2019). The many shapley values for model explanation. arXiv preprint arXiv:1908.08474.
Villani, C. (2003). Topics in Optimal Transportation. American Mathematical Society.
Woodworth, B., Gunasekar, S., Ohannessian, M.I., & Srebro, N. (2017). Learning nondiscriminatory predictors. In Proc. of Conference on Learning Theory, p. 1920-1953.
Young, H. P. (1985). Monotonic solutions of cooperative games. International Journal of Game Theory, 14(2), 65–72.
Zemel, R., Wu, Y., Swersky, K., Pitassi, T., Dwork, C. (2013). Learning Fair representations. In Proc. of Intl. Conf. on Machine Learning, p. 325-333.
Zhang, B.H., Lemoine, B., Mitchell, M. (2018). Mitigating unwanted biases with adversarial learning. In Proc. of the 2018 AAAI/ACM Conference on AI, ethics and society (pp. 335-340)