The Fisher–Rao loss for learning under label noise

Henrique K. Miyamoto1, Fábio C. C. Meneghetti2, Sueli I. R. Costa2
1University of Campinas (UNICAMP)
2Institute of Mathematics, Statistics and Scientific Computing (IMECC), University of Campinas (Unicamp), Campinas, Brazil

Tóm tắt

Từ khóa


Tài liệu tham khảo

Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)

Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

Calin, O.: Deep Learning Architectures: A Mathematical Approach. Springer, Cham (2020)

Kline, D.M., Berardi, V.L.: Revisiting squared-error and cross-entropy functions for training neural network classifiers. Neural Comp. Appl. 14(4), 310–318 (2005)

Golik, P., Doetsch, P., Ney, H.: Cross-entropy vs. squared error training: a theoretical and experimental comparison. In: Proc. Interspeech, pp. 1756–1760 (2013)

Janocha, K., Czarnecki, W.M.: On loss functions for deep neural networks in classification. In: Schedae Informaticae, vol. 25 (2017)

Demirkaya, A., Chen, J., Oymak, S.: Exploring the role of loss functions in multiclass classification. In: Proc. 54th Annu. Conf. Inf. Sci. Syst. (CISS), pp. 1–5 (2020)

Hui, L., Belkin, M.: Evaluation of neural architectures trained with square loss vs cross-entropy in classification tasks. In: Proc. 9th Int. Conf. Learn. Representations (ICLR) (2021)

Singh, A., Príncipe, J.C.: A loss function for classification based on a robust similarity metric. In: Proc. Int. Joint Conf. Neural Netw. (IJCNN), pp. 1–6 (2010)

Frogner, C., Zhang, C., Mobahi, H., Araya-Polo, M., Poggio T.: Learning with a Wasserstein loss. In: Proc. 29th Conf. Neural Inf. Process. Syst. (NIPS), pp. 2053–2061 (2015)

Hou, L., Yu, C.-P., Samaras, D.: Squared earth movers distance loss for training deep neural networks on ordered-classes. In: Proc. 31st Conf. Neural Inf. Process. Syst. (NIPS) (2017)

Clough, J., Byrne, N., Oksuz, I., Zimmer, V.A., Schnabel, J.A., King, A.: A topological loss function for deep-learning based image segmentation using persistent homology. IEEE Trans. Pattern Anal. Mach. Intell. (2020) (early access)

Frenay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)

Sastry, P.S., Manwani, N.: Robust learning of classifiers in the presence of label noise. In: Pal, A., Pal, S.K. (eds.) Pattern Recognition and Big Data. World Scientific, New Jersey (2016)

Ghosh, A., Manwani, N., Sastry, P.: Making risk minimization tolerant to label noise. Neurocomputing 160, 93–107 (2015)

Ghosh, A., Kumar, H., Sastry, P.S.: Robust loss functions under label noise for deep neural networks. In: Proc. 31st AAAI Conf. Artif. Intell., pp. 1919–1925 (2017)

Kumar, H., Sastry, P.S.: Robust loss functions for learning multi-class classifiers. In: Proc. IEEE Int. Conf. Syst. Man Cybern. (SMC), pp. 687–692 (2018)

Zhang, Z., Sabuncu, M.: Generalized cross entropy loss for training deep neural networks with noisy labels. In: Proc. 32nd Conf. Neural Inf. Process. Syst. (NeurIPS) (2018)

Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)

Amari, S., Nagaoka, H.: Methods of Information Geometry. American Mathematical Society, Providence (2000)

Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.: Information geometry and sufficient statistics. Probab. Theory Relat. Fields 162(1), 327–364 (2015)

Gattone, S.A., De Sanctis, A., Russo, T., Pulcini, D.: A shape distance based on the Fisher–Rao metric and its application for shapes clustering. Phys. A Stat. Mech. Appl. 487, 93–102 (2017)

Taylor, S.: Clustering financial return distributions using the Fisher information metric. Entropy 21(2) (2019)

Pinele, J., Strapasson, J.E., Costa, S.I.R.: The Fisher–Rao distance between multivariate normal distributions: special cases, bounds and applications. Entropy 22(4) (2020)

Picot, M., Messina, F., Boudiaf, M., Labeau, F., Ayed, I.B., Piantanida, P.: Adversarial robustness via Fisher–Rao regularization. IEEE Trans. Pattern Anal. Mach. Intell. (2022) (early access)

Gomes, E.D.C., Alberge, F., Duhamel, P., Piantanida, P.: Igeood: an information geometry approach to out-of-distribution detection. In: Proc. Int. Conf. Learn. Representations (ICLR) (2022)

Arvanitidis, G., González-Duque, M., Pouplin, A., Kalatzis, D., Hauberg, S.: Pulling back information geometry. In: Proc. 25th Int. Conf. Artif. Intell. Stat. (AISTATS), pp. 4872–4894 (2022)

Atkinson, C., Mitchell, A.F.S.: Rao’s distance measure. Sankhyā Indian J. Stat. Ser. A (1961–2002), 43(3), 345–365 (1981)

Calin, O., Udrişte, C.: Geometric Modeling in Probability and Statistics. Springer, Cham (2014)

Costa, S.I.R., Santos, S.A., Strapasson, J.E.: Fisher information distance: a geometrical reading. Discrete Appl. Math. 197, 59–69 (2015)

Kass, R.E., Vos, P.W.: Geometrical Foundations of Asymptotic Inference. Wiley, New York (1997)

Tsybakov, A.B.: Introduction to Nonparametric Estimation. Springer, New York (2009)

Tsallis, C.: What are the numbers that experiments provide? Quim. Nova 17, 468–471 (1994)

Manwani, N., Sastry, P.S.: Noise tolerance under risk minimization. IEEE Trans. Cybern. 43(3), 1146–1151 (2013)

Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

LeCun, Y., Cortes, C., Burges, C.J.C.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/

Nix, D., Weigend, A.: Estimating the mean and variance of the target probability distribution. In: Proc. IEEE Int. Conf. Neural Netw. (ICNN), vol. 1, pp. 55–60 (1994)