Entropy-regularized 2-Wasserstein distance between Gaussian measures

Information Geometry - Tập 5 - Trang 289-323 - 2021
Anton Mallasto1, Augusto Gerolin2, Hà Quang Minh3
1Department of Computer Science, Aalto University, Helsinki, Finland
2Department of Theoretical Chemistry, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
3RIKEN Center for Advanced Intelligence Project, Tokyo, Japan

Tóm tắt

Gaussian distributions are plentiful in applications dealing in uncertainty quantification and diffusivity. They furthermore stand as important special cases for frameworks providing geometries for probability measures, as the resulting geometry on Gaussians is often expressible in closed-form under the frameworks. In this work, we study the Gaussian geometry under the entropy-regularized 2-Wasserstein distance, by providing closed-form solutions for the distance and interpolations between elements. Furthermore, we provide a fixed-point characterization of a population barycenter when restricted to the manifold of Gaussians, which allows computations through the fixed-point iteration algorithm. As a consequence, the results yield closed-form expressions for the 2-Sinkhorn divergence. As the geometries change by varying the regularization magnitude, we study the limiting cases of vanishing and infinite magnitudes, reconfirming well-known results on the limits of the Sinkhorn divergence. Finally, we illustrate the resulting geometries with a numerical study.

Tài liệu tham khảo

Agueh, M., Carlier, G.: Barycenters in the Wasserstein space. SIAM J. Math. Anal. 43(2), 904–924 (2011) Álvarez-Esteban, P.C., Barrio, E., Cuesta-Albertos, J.A., Matrán, C.: A fixed-point approach to barycenters in Wasserstein space. J. Math. Anal. Appl. 441(2), 744–762 (2016) Amari, S.: Information Geometry and its Applications, vol. 194. Springer, Berlin (2016) Amari, S., Karakida, R., Oizumi, M.: Information geometry connecting Wasserstein distance and Kullback–Leibler divergence via the entropy-relaxed transportation problem. Inf. Geom. 1(1), 13–37 (2018) Ambrosio, L., Gigli, N., Savaré, G.: Gradient Flows: In Metric Spaces and in the Space of Probability Measures. Springer Science & Business Media, New York (2008) Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, 7–9 August, 2017 (2017) Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Log-Euclidean metrics for fast and simple calculus on diffusion tensors. Magn. Reson. Med. 56(2), 411–421 (2006) Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM J. Matrix Anal. Appl. 29(1), 328–347 (2007) Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.: Information Geometry, vol. 64. Springer, Berlin (2017) Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), A1111–A1138 (2015) Bigot, J., Cazelles, E., Papadakis, N.: Penalization of barycenters in the Wasserstein space. SIAM J. Math. Anal. 51(3), 2261–2285 (2019) Borwein, J.M., Lewis, A.S., Nussbaum, R.D.: Entropy minimization, DAD problems, and doubly stochastic kernels. J. Funct. Anal. 123(2), 264–307 (1994) Cazelles, E., Bigot, J., Papadakis, N.: Regularized barycenters in the Wasserstein space. In: International Conference on Geometric Science of Information, pp. 83–90. Springer (2017) Chebbi, Z., Moakher, M.: Means of Hermitian positive-definite matrices based on the log-determinant \(\alpha \)-divergence function. Linear Algebra Appl. 436(7), 1872–1889 (2012) Chen, Y., Georgiou, T.T., Pavon, M.: Optimal steering of a linear stochastic system to a final probability distribution, part I. IEEE Trans. Autom. Control 61(5), 1158–1169 (2015) Chen, Y., Georgiou, T.T., Pavon, M.: On the relation between optimal transport and Schrödinger bridges: a stochastic control viewpoint. J. Optim. Theory Appl. 169(2), 671–691 (2016) Cichocki, A., Cruces, S., Amari, S.: Log-determinant divergences revisited: alpha-beta and gamma log-det divergences. Entropy 17(5), 2988–3034 (2015) Congedo, M., Barachant, A., Bhatia, R.: Riemannian geometry for EEG-based brain-computer interfaces; a primer and a review. Brain-Comput. Interfaces 4(3), 155–174 (2017) Csiszár, I.: I-divergence geometry of probability distributions and minimization problems. Ann. Probab. 3:146–158 (1975) Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. Adv. Neural Inf. Process. Syst. 26, 2292–2300 (2013) Cuturi, M., Doucet, A.: Fast computation of Wasserstein barycenters. In: International Conference on Machine Learning, pp. 685–693 (2014) Cuturi, M., Peyré, G.: Computational optimal transport. Found. Trends® Mach. Learn. 11(5–6), 355–607 (2019) del Barrio, E., Loubes, J.-M.: The statistical effect of entropic regularization in optimal transportation. arXiv preprint arXiv:2006.05199 (2020) Deshpande, I., Zhang, Z., Schwing, A.: Generative modeling using the sliced Wasserstein distance. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3483–3491 (2018) Di Marino, S., Gerolin, A.: An optimal transport approach for the Schrödinger bridge problem and convergence of Sinkhorn algorithm. J. Sci. Comput. 85(2), 1–28 (2020) Dowson, D.C., Landau, B.V.: The Fréchet distance between multivariate normal distributions. J. Multivar. Anal. 12(3), 450–455 (1982) Dryden, I.L., Koloydenko, A., Zhou, D.: Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging. Ann. Appl. Stat. 3, 1102–1123 (2009) Dukler, Y., Li, W., Lin, A., Montúfar, G.: Wasserstein of Wasserstein loss for learning generative models. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 97, pp. 1716–1725 (2019) Feragen, A., Fuster, A.: Geometries and interpolations for symmetric positive definite matrices. In: Modeling, Analysis, and Visualization of Anisotropy, pp. 85–113. Springer (2017) Feydy, J., Séjourné, T., Vialard, F.-X., Amari, S., Trouve, A., Peyré, G.: Interpolating between optimal transport and MMD using Sinkhorn divergences. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2681–2690 (2019) Franklin, J., Lorenz, J.: On the scaling of multidimensional matrices. Linear Algebra Appl. 114, 717–735 (1989) Galichon, A.: Optimal Transport Methods in Economics. Princeton University Press, Princeton (2018) Galichon, A., Salanié, B.: Matching with Trade-Offs: Revealed Preferences Over Competing Characteristics. Sciences po publications, Sciences Po (2010) Genevay, A., Chizat, L., Bach, F., Cuturi, M., Peyré, G.: Sample Complexity of Sinkhorn Divergences. In: Chaudhuri, K., Sugiyama, M. (eds.) Proceedings of Machine Learning Research, Proceedings of Machine Learning Research, vol. 89, pp. 1574–1583 (2019) Genevay, A., Cuturi, M., Peyré, G., Bach, F.: Stochastic optimization for large-scale optimal transport. Adv. Neural Inf. Process. Syst. 29, 3440–3448 (2016) Genevay, A., Peyre, G., Cuturi, M.: Learning Generative Models with Sinkhorn Divergences. In: Storkey, A., Perez-Cruz, F. (eds.) Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 84, pp. 1608–1617 (2018) Gentil, I., Léonard, C., Ripani, L.: About the analogy between optimal transport and minimal entropy. In: Annales de la Faculté des Sciences de Toulouse. Mathématiques, vol. 3, pp. 569–600 (2017) Gerolin, A., Grossi, J., Gori-Giorgi, P.: Kinetic correlation functionals from the entropic regularisation of the strictly-correlated electrons problem. J. Chem. Theory Comput. 16(1), 488–498 (2019) Gerolin, A., Kausamo, A., Rajala, T.: Multi-marginal entropy-transport with repulsive cost. Calc. Var. Partial Differ. Equ. 59(3), 90 (2020) Gigli, N., Tamanini, L.: Second order differentiation formula on \({RCD}^*({K},{N})\) spaces. J. Eur. Math. Soc. (JEMS) (2018) Gigli, N., Tamanini, L.: Benamou–Brenier and duality formulas for the entropic cost on \( {R}{C}{D}^{*}({K}, {N}) \) spaces. Probab. Theory Relat. Fields (2018) Givens, C.R., Shortt, R.M.: A class of Wasserstein metrics for probability distributions. Mich. Math. J. 31(2), 231–240 (1984) Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems, pp. 6626–6637 (2017) Janati, H., Cuturi, M., Gramfort, A.: Debiased Sinkhorn barycenters. In: Proceedings of the 37th International Conference on Machine Learning, pp. 4692–4701 (2020) Janati, H., Muzellec, B., Peyré, G., Cuturi, M.: Entropic optimal transport between unbalanced Gaussian measures has a closed form. Adv. Neural Inf. Process. Syst. 33 (2020) Knott, M., Smith, C.S.: On the optimal mapping of distributions. J. Optim. Theory Appl. 43(1), 39–49 (1984) Kroshnin, A., Dvinskikh, D., Dvurechensky, P., Gasnikov, A., Tupitsa, N., Uribe, C.: On the complexity of approximating Wasserstein barycenter. In: International Conference on Machine Learning, pp. 3530–3540 (2019) Kum, S., Duong, M.H., Lim, Y., Yun, S.: Penalization of barycenters for \(\varphi \)-exponential distributions. arXiv preprint arXiv:2006.08743 (2020) Larotonda, G.: Nonpositive curvature: a geometrical approach to Hilbert–Schmidt operators. Differ. Geom. Appl. 25, 679–700 (2007) Léonard, C.: A survey of the Schrödinger problem and some of its connections with optimal transport. Discrete Contin. Dyn. Syst. A 34(4), 1533–1574 (2014) Lin, T., Ho, N., Cuturi, M., Jordan, M.I.: On the complexity of approximating multimarginal optimal transport. arXiv preprint arXiv:1910.00152 (2019) Lunz, S., Öktem, O., Schönlieb, C.-B.: Adversarial regularizers in inverse problems. In: Advances in Neural Information Processing Systems, pp. 8507–8516 (2018) Malagò, L., Montrucchio, L., Pistone, G.: Wasserstein Riemannian geometry of Gaussian densities. Inf. Geom. 1(2), 137–179 (2018) Mallasto, A., Feragen, A.: Learning from uncertain curves: the 2-Wasserstein metric for Gaussian processes. Adv. Neural Inf. Process. Syst. 30, 5660–5670 (2017) Mallasto, A., Frellsen, J., Boomsma, W., Feragen, A.: (q, p)-Wasserstein GANs: comparing ground metrics for Wasserstein GANs. arXiv preprint arXiv:1902.03642 (2019) Mallasto, A., Montúfar, G., Gerolin, A.: How well do WGANs estimate the Wasserstein metric? arXiv:1910.03875 (2019) Masarotto, V., Panaretos, V.M., Zemel, Y.: Procrustes metrics on covariance operators and optimal transportation of Gaussian processes. Sankhya A, pp. 1–42 (2018) McCann, R.J.: A convexity principle for interacting gases. Adv. Math. 128(1), 153–179 (1997) Mena, G., Weed, J.: Statistical bounds for entropic optimal transport: sample complexity and the central limit theorem. In: Advances in Neural Information Processing Systems (2019) Minh, H.Q.: Infinite-dimensional Log-Determinant divergences between positive definite trace class operators. Linear Algebra Appl. 528, 331–383 (2017) Minh, H.Q., San Biagio, M., Murino, V.: Log-Hilbert–Schmidt metric between positive definite operators on Hilbert spaces. Adv. Neural Inf. Process. Syst. 27, 388–396 (2014) Minh, H.Q.: Entropic regularization of Wasserstein distance between infinite-dimensional Gaussian measures and Gaussian processes. preprint arXiv:2011.07489 (2020) Minh, H.Q.: Convergence and finite sample approximations of entropic regularized Wasserstein distances in Gaussian and RKHS settings. arXiv preprint arXiv:2101.01429 (2021) Müller, A.: Integral probability metrics and their generating classes of functions. Adv. Appl. Probab. 29(2), 429–443 (1997) Muzellec, B., Cuturi, M.: Generalizing point embeddings using the Wasserstein space of elliptical distributions. Adv. Neural Inf. Process. Syst. 31, 10237–10248 (2018) Olkin, I., Pukelsheim, F.: The distance between two random vectors with given dispersion matrices. Linear Algebra Appl. 48, 257–263 (1982) Patrini, G., van den Berg, R., Forre, P., Carioni, M., Bhargav, S., Welling, M., Genewein, T., Nielsen, F.: Sinkhorn Autoencoders. In: Uncertainty in Artificial Intelligence, pp. 733–743 (2020) Pennec, X., Fillard, P., Ayache, N.: A Riemannian framework for tensor computing. Int. J. Comput. Vis. 66(1), 41–66 (2006) Ramdas, A., Trillos, N., Cuturi, M.: On Wasserstein two-sample testing and related families of nonparametric tests. Entropy 19(2), 47 (2017) Ripani, L.: The Schrödinger problem and its links to optimal transport and functional inequalities. Ph.D. thesis, University Lyon 1 (2017) Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000) Ruschendorf, L.: Convergence of the iterative proportional fitting procedure. Ann. Stat. 23(4), 1160–1174 (1995) Rüschendorf, L., Thomsen, W.: Note on the Schrödinger equation and I-projections. Stat. Probab. Lett. 17(5), 369–375 (1993) Rüschendorf, L., Thomsen, W.: Closedness of sum spaces and the generalized Schrödinger problem. Theory Probab. Appl. 42(3), 483–494 (1998) Schrödinger, E.: Über die umkehrung der naturgesetze. Verlag Akademie der wissenschaften in kommission bei Walter de Gruyter u Company (1931) Sommerfeld, M.: Wasserstein distance on finite spaces: Statistical inference and algorithms. PhD thesis, Georg-August-Universität Göttingen (2017) Takatsu, A.: Wasserstein geometry of Gaussian measures. Osaka J. Math. 48(4), 1005–1026 (2011) Thanwerdas, Y., Pennec, X.: Exploration of balanced metrics on symmetric positive definite matrices. In: International Conference on Geometric Science of Information, pp. 484–493. Springer (2019) Tuzel, O., Porikli, F., Meer, P.: Region covariance: a fast descriptor for detection and classification. In: European Conference on Computer Vision, pp. 589–600. Springer (2006) Tuzel, O., Porikli, F., Meer, P.: Human detection via classification on Riemannian manifolds. CVPR 1, 4 (2007) Tuzel, O., Porikli, F., Meer, P.: Pedestrian detection via classification on Riemannian manifolds. IEEE Trans. Pattern Anal. Mach. Intell. 30(10), 1713–1727 (2008) Villani, C.: Optimal transport: Old and New, Grundlehren der mathematischen Wissenschaften, vol. 338. Springer Science & Business Media (2008) Weed, J., Bach, F.: Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. Bernoulli 25(4A), 2620–2648 (2019) Zambrini, J.-C.: The research program of stochastic deformation (with a view toward geometric mechanics). In Stochastic Analysis: A Series of Lectures, pp. 359–393. Springer (2015)