Coupling matrix manifolds assisted optimization for optimal transport problems

Machine Learning - Tập 110 - Trang 533-558 - 2021
Dai Shi1, Junbin Gao1, Xia Hong2, S. T. Boris Choy1, Zhiyong Wang3
1Discipline of Business Analytics, The University of Sydney Business School, The University of Sydney, Sydney, Australia
2Department of Computer Science, University of Reading, Reading, UK
3School of Computer Science, The University of Sydney, Sydney, Australia

Tóm tắt

Optimal transport (OT) is a powerful tool for measuring the distance between two probability distributions. In this paper, we introduce a new manifold named as the coupling matrix manifold (CMM), where each point on this novel manifold can be regarded as a transportation plan of the optimal transport problem. We firstly explore the Riemannian geometry of CMM with the metric expressed by the Fisher information. These geometrical features can be exploited in many essential optimization methods as a framework solving all types of OT problems via incorporating numerical Riemannian optimization algorithms such as gradient descent and trust region algorithms in CMM manifold. The proposed approach is validated using several OT problems in comparison with recent state-of-the-art related works. For the classic OT problem and its entropy regularized variant, it is shown that our method is comparable with the classic algorithms such as linear programming and Sinkhorn algorithms. For other types of non-entropy regularized OT problems, our proposed method has shown superior performance to other works, whereby the geometric information of the OT feasible space was not incorporated within.

Tài liệu tham khảo

Absil, P. A., Mahony, R., & Sepulchre, R. (2008). Optimization algorithms on matrix manifolds. Princeton: Princeton University Press. Altschuler, J., Weed, J., & Rigollet, P. (2017). Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In Proceedings of the 31st international conference on neural information processing systems (pp. 1961–1971), Curran Associates Inc., USA, NIPS’17. Amari, S., & Nagaoka, H. (2007). Methods of informantion geometry. Providence: American Mathematical Society. Amari, S., & Nagaoka, H. (2000). Methods of Information Geometry (pp. 37–40). Oxford University Press, New York, chap Chentsov’s theorem and some historical remarks. Ambrogioni, L., Güçlü U, Güçlütürk, Y., Hinne, M., Maris, E., & van Gerven, M. A. J. (2018). Wasserstein variational inference. In Proceedings of the 32nd international conference on neural information processing systems (pp. 2478–2487), Curran Associates Inc., USA, NIPS’18. Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. arXiv:1701.07875. Bertsekas, D. (1999). Nonlinear programming. Belmont: Athena Scientific. Bousquet, O., Gelly, S., Tolstikhin, I., Simon-Gabriel, C. J., & Schölkopf, B. (2017). From optimal transport to generative modeling: The VEGAN cookbook. Tech. rep. Brezis, H. (2018). Remarks on the Monge–Kantorovich problem in the discrete setting. Comptes Rendus Mathematique, 356(2), 207–213. Bruzzone, L., & Marconcini, M. (2010). Domain adaptation problems: A DASVM classification technique and a circular validation strategy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(5), 770–787. Courty, N., Flamary, R., Tuia, D., & Rakotomamonjy, A. (2016). Optimal transport for domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(9), 1853–1865. Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. Advances in Neural Information Processing Systems, 26, 2292–2300. Cuturi, M., & Doucet, A. (2014). Fast computation of Wasserstein barycenters. In Xing, E. P., & Jebara, T. (Eds.) Proceedings of the 31st international conference on machine learning (pp. 685–693), Bejing, China, vol 32. De Loera, J. A., & Kim, E. D. (2014). Combinatorics and geometry of transportation polytopes: An update. Discrete Geometry and Algebraic Combinatorics, 625, 37–76. Dessein, A., Papadakis, N., & Rouas, J. L. (2018). Regularised optimal transport and the rot mover’s distance. Journal of Machine Learning Research, 19(15), 1–53. Douik, A., & Hassibi, B. (2019). Manifold optimization over the set of doubly stochastic matrices: A second-order geometry. IEEE Transactions on Signal Processing, 67(22), 5761–5774. Essid, M., & Solomon, J. (2018). Quadratically regularized optimal transport on graphs. SIAM Journal on Scientific Computing, 40(4), A1961–A1986. Ferradans, S., Papadakis, N., Peyre, G., & Aujol, J. F. (2014). Regularized discrete optimal transport. SIAM Journal on Imaging Sciences, 7(3), 1853–1882. Flamary, R., Cuturi, M., Courty, N., & Rakotomamonjy, A. (2018). Wasserstein discriminant analysis. Machine Learning, 107(12), 1923–1945. Frogner, C., Zhang, C., Mobahi, H., Araya-Polo, M., & Poggio, T. A. (2015). Learning with a Wasserstein loss. In Advances in neural information processing systems (NIPS), vol 28. Gabay, D. (1982). Minimizing a differentiable function over a differential manifold. Journal of Optimization Theory and Applications, 37(2), 177–219. Genevay, A., Cuturi, M., Peyré, G., & Bach, F. (2016). Stochastic optimization for large-scale optimal transport. In Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (Eds.) Advances in neural information processing systems 29 (pp. 3440–3448). Curran Associates, Inc. Germain, P., Habrard, A., Laviolette, F., & Morvant, E. (2013). APAC-Bayesian approach for domain adaptation with specialization to linear classifiers. In Proceedings of international conference on machine learning (ICML) (pp. 738–746). Atlanta, USA. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. (2017). Improved training of Wasserstein gans. In Proceedings of the 31st international conference on neural information processing systems (pp. 5769–5779). Curran Associates Inc., Red Hook, NY, USA, NIPS’17. Haker, S., Zhu, L., Tannenbaum, A., & Angenent, S. (2004). Optimal mass transport for registration and warping. International Journal of Computer Vision, 60(3), 225–240. Hong, X., & Gao, J. (2015). Sparse density estimation on multinomial manifold combining local component analysis. In Proceedings of international joint conference on neural networks (IJCNN) (pp. 1–7). https://doi.org/10.1109/IJCNN.2015.7280301. Hong, X., & Gao, J. (2018). Estimating the square root of probability density function on Riemannian manifold. Expert Systems (in press) https://doi.org/10.1111/exsy.12266. Hong, X., Gao, J., Chen, S., & Zia, T. (2015). Sparse density estimation on the multinomial manifold. IEEE Transactions on Neural Networks and Learning Systems, 26, 2972–2977. Jacobs, M., & Lèger, F. (2020). A fast approach to optimal transport: The back-and-forth method. arXiv:190512154 2. Kantorovich, L. V. (1942). On the translocation of masses. Doklady Akademii Nauk SSSR (NS), 37, 199–201. Knight, P. A. (2008). The Sinkhorn–Knopp algorithm: Convergence and applications. SIAM Journal on Matrix Analysis and Applications, 30(1), 261–275. Kolouri, S., Pope, P. E., Martin, C. E., & Rohde, G. K. (2019) Sliced Wasserstein auto-encoders. In Proceedings of international conference on learning representation (ICLR). Lee, Y. T., & Sidford, A. (2014). Path finding methods for linear programming: Solving linear programs in o(vrank) iterations and faster algorithms for maximum flow. In Proceedings of IEEE 55th annual symposium on foundations of computer science (pp. 424–433). https://doi.org/10.1109/FOCS.2014.52. Maman, G., Yair, O., Eytan, D., & Talmon, R. (2019). Domain adaptation using Riemannian geometry of SPD matrices. In International conference on acoustics, speech and signal processing (ICASSP) (pp. 4464–4468). Brighton, United Kingdom: IEEE. Miller, M., & Lent, J. V. (2016). Monge’s optimal transport distance with applications for nearest neighbour image classification. arXiv:1612.00181. Monge, G. (1781). Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences de Paris. Montavon, G., Müller, K. R., & Cuturi, M. (2016). Wasserstein training of restricted Boltzmann machines. Advances in Neural In-formation Processing Systems, 29, 3718–3726. Muzellec, B., Nock, R., Patrini, G., & Nielsen, F. (2017). Tsallis regularized optimal transport and ecological inference. In Proceedings of AAAI (pp. 2387–2393). Panaretos, V. M., & Zemel, Y. (2019). Statistical aspects of Wasserstein distances. Annual Review of Statistics and Its Application, 6, 405–431. Peyre, G., & Cuturi, M. (2019). Computational optimal transport: With applications to data science. Foundations and Trends in Machine Learning Series, Now Publishers, https://books.google.com.au/books?id=J0BiwgEACAAJ. Peyré, G., Cuturi, M., et al. (2019). Computational optimal transport. Foundations and Trends® in Machine Learning, 11(5–6), 355–607. Queyranne, M., & Spieksma, F. (2009). Multi-index transportation problems: Multi-index transportation problems MITP. Encyclopedia of Optimization, pp. 2413–2419. Rabin, J., & Papadakis, N. (2015). Convex color image segmentation with optimal transport distances. In International conference on scale space and variational methods in computer vision. Springer, pp. 256–269. Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121. Schmitzer, B. (2019). Stabilized sparse scaling algorithms for entropy regularized transport problems. SIAM Journal on Scientic Computing, 41(3), A1443–A1481. Solomon, J., de Goes, F., Peyré, G., Cuturi, M., Butscher, A., Nguyen, A., et al. (2015). Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics, 34(4), 66:1–66:11. Su, B., & Hua, G. (2017). Order-preserving Wasserstein distance for sequence matching. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1049–1057). Su, B., & Wu, Y. (2019). Learning distance for sequences by learning a ground metric. In Proceedings of the 36th international conference on machine learning (ICML). Sun, Y., Gao, J., Hong, X., Mishra, B., & Yin, B. (2016). Heterogeneous tensor decomposition for clustering via manifold optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 476–489. Tolstikhin, I., Bousquet, O., Gelly, S., & Schoelkopf, B. (2018). Wasserstein auto-encoders. In Proceedings of international conference on learning representation. Villani, C. (2009). Optimal transport: Old and new. Berlin: Springer, chap The Wasserstein distances (pp. 93–111). Yair, O., Dietrich, F., Talmon, R., & Kevrekidis, I.G. (2019). Optimal transport on the manifold of SPD matrices for domain adaptation. arXiv:1906.00616. Zhang, S., Gao, Y., Jiao, Y., Liu, J., Wang, Y., & Yang, C. (2019). Wasserstein-Wasserstein auto-encoders. arXiv:1902.09323. Zhao, P,, & Zhou, Z. H. (2018). Label distribution learning by optimal transport. In Proceedings of the thirty-second AAAI conference on artificial intelligence (AAAI) (pp. 4506–4513).