Information geometry connecting Wasserstein distance and Kullback–Leibler divergence via the entropy-relaxed transportation problem

Information Geometry - Tập 1 - Trang 13-37 - 2018
Shun-ichi Amari1,2, Ryo Karakida3, Masafumi Oizumi1,2
1RIKEN Brain Science Institute, Saitama, Japan
2Araya, Inc., Tokyo, Japan
3National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan

Tóm tắt

Two geometrical structures have been extensively studied for a manifold of probability distributions. One is based on the Fisher information metric, which is invariant under reversible transformations of random variables, while the other is based on the Wasserstein distance of optimal transportation, which reflects the structure of the distance between underlying random variables. Here, we propose a new information-geometrical theory that provides a unified framework connecting the Wasserstein distance and Kullback–Leibler (KL) divergence. We primarily considered a discrete case consisting of n elements and studied the geometry of the probability simplex $$S_{n-1}$$ , which is the set of all probability distributions over n elements. The Wasserstein distance was introduced in $$S_{n-1}$$ by the optimal transportation of commodities from distribution $${\varvec{p}}$$ to distribution $${\varvec{q}}$$ , where $${\varvec{p}}$$ , $${\varvec{q}} \in S_{n-1}$$ . We relaxed the optimal transportation by using entropy, which was introduced by Cuturi. The optimal solution was called the entropy-relaxed stochastic transportation plan. The entropy-relaxed optimal cost $$C({\varvec{p}}, {\varvec{q}})$$ was computationally much less demanding than the original Wasserstein distance but does not define a distance because it is not minimized at $${\varvec{p}}={\varvec{q}}$$ . To define a proper divergence while retaining the computational advantage, we first introduced a divergence function in the manifold $$S_{n-1} \times S_{n-1}$$ composed of all optimal transportation plans. We fully explored the information geometry of the manifold of the optimal transportation plans and subsequently constructed a new one-parameter family of divergences in $$S_{n-1}$$ that are related to both the Wasserstein distance and the KL-divergence.

Tài liệu tham khảo

Santambrogio, F.: Optimal Transport for Applied Mathematicians. Birkhauser, Basel (2015)

Cuturi, M.: Sinkhorn distances: light speed computation of optimal transport. In: Advances in Neural Information Processing Systems, pp. 2292–2300 (2013)

Cuturi, M., Peyré, G.: A smoothed dual formulation for variational Wasserstein problems. SIAM J. Imaging Sci. 9, 320–343 (2016)

Amari, S., Tsuchiya, N., Oizumi, M.: Geometry of information integration (2017). arXiv:1709.02050

Oizumi, M., Albantakis, L., Tononi, G.: From the phenomenology to the mechanisms of consciousness: integrated information theory 3.0. PLoS Comput. Biol. 10, e1003588 (2014)

Muzellec, B., Nock, R., Patrini, G., Nielsen, F.: Tsallis regularized optimal transport and ecological inference (2016). arXiv:1609.04495v1

Amari, S, Karakida, R. Oizumi, M., Cuturi, M.: New divergence derived from Cuturi function (in preparation)