Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo
Tính Tối Ưu Tham Số Bảo Tồn và Phương Pháp Đỉnh Cho Các Vấn Đề Min-Max Đơn Giản
Tóm tắt
Chúng tôi nghiên cứu phương pháp đỉnh cho các vấn đề min-max và điều tra sự hội tụ của nó mà không cần giả thiết về tính lồi, khả vi hoặc điều kiện phát biểu. Vấn đề trung tâm là xác định xem "công thức tối ưu tham số" có cung cấp một gradient bảo tồn hay không, một khái niệm về đạo hàm tổng quát rất phù hợp cho tối ưu hóa. Câu trả lời cho câu hỏi này là dương tính trong một bối cảnh nửa đại số, và tổng quát hơn là trong các bối cảnh có thể xác định. Do đó, phương pháp đỉnh áp dụng cho các mục tiêu có thể xác định được chứng minh là có hành vi tối thiểu và hội tụ về một tập hợp trạng thái cân bằng thỏa mãn điều kiện tối ưu. Khả năng xác định là yếu tố then chốt trong chứng minh của chúng tôi: chúng tôi chỉ ra rằng đối với một lớp rộng hơn các hàm không trơn, tính bảo tồn của công thức tối ưu tham số có thể thất bại, dẫn đến hành vi phi lý của phương pháp đỉnh.
Từ khóa
#phương pháp đỉnh #tối ưu tham số bảo tồn #vấn đề min-max #tính hội tụ #hàm không trơnTài liệu tham khảo
Ablin, P., Peyré, G., Moreau, T.: Super-efficiency of automatic differentiation for functions defined as a minimum. Presented at the (2020)
Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., Kolter, Z. (2019). Differentiable convex optimization layers. Advances in neural information processing systems
Aliprantis C.D., Border K.C. (2005) Infinite Dimensional Analysis (3rd edition) Springer
Ambrosio L., Gigli N. and Savaré G. (2008). Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media
Amos, B., Kolter, J. Z. (2017). Optnet: Differentiable optimization as a layer in neural networks. In International Conference on Machine Learning
Arjovsky, Chintala, Bottou (2017). Wasserstein GAN. International Conference on Machine Learning
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Mathematical Programming 137(1), 91–129 (2013)
Benaïm, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions. SIAM Journal on Control and Optimization 44(1), 328–348 (2005)
Bianchi, P., Hachem, W. and Schechtman, S. (2020). Convergence of constant step stochastic gradient descent for non-smooth non-convex functions. arXiv preprint arXiv:2005.08513
Berthet, Q., Blondel, M., Teboul, O., Cuturi, M., Vert, J. P., Bach, F. (2020). Learning with differentiable perturbed optimizers. Advances in neural information processing systems
Bertsekas D. P. (1971). Control of uncertain systems with a set-membership description of the uncertainty. Doctoral dissertation, Massachusetts Institute of Technology
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM Journal on Optimization 18(2), 556–572 (2007)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming 146(1), 459–494 (2014)
Bolte, J. and Pauwels, E. (2020). Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning. Mathematical Programming
Bolte, J. and Pauwels, E. (2020). A mathematical model for automatic differentiation in machine learning. Proceedings of the conference on neural information processing systems
Borwein, J.M., Moors, W.B.: A chain rule for essentially smooth Lipschitz functions. SIAM Journal on Optimization 8(2), 300–308 (1998)
Borwein, J., Moors, W., Wang, X.: Generalized subdifferentials: a Baire categorical approach. Transactions of the American Mathematical Society 353(10), 3875–3893 (2001)
Borwein, J.M.: Generalisations, Examples, and Counter-examples in Analysis and Optimisation. Set-Valued and Variational Analysis 25(3), 467–479 (2017)
C. Castera, J. Bolte, C. Févotte and E. Pauwels (2019). An inertial newton algorithm for deep learning. arXiv preprint http://arxiv.org/abs/1905.12278arXiv:1905.12278
Clarke F. H. (1983). Optimization and nonsmooth analysis. Siam
Coste, M.: An introduction to o-minimal geometry. RAAG notes, Institut de Recherche Mathématique de Rennes (1999)
Danskin, J.M.: The theory of max-min, with applications. SIAM Journal on Applied Mathematics 14(4), 641–664 (1966)
Davis D., Drusvyatskiy D., Kakade S., and Lee J. D. (2020). Stochastic subgradient method converges on tame functions, 20(1), 119-154. Foundations of Computational Mathematics
Davis, D. and Drusvyatskiy, D. (2021). Conservative and semismooth derivatives are equivalent for semialgebraic maps. arXiv preprint http://arxiv.org/abs/2102.08484arXiv:2102.08484
Dem’Yanov, V.F.: On the solution of several minimax problems. I. Cybernetics 2(6), 47–53 (1966)
van den Dries, L., Miller, C.: Geometric categories and o-minimal structures. Duke Math. J 84(2), 497–540 (1996)
Evans L. C. and Gariepy R. F. (2015). Measure theory and fine properties of functions. Revised Edition. Chapman and Hall/CRC
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y. Generative adversarial nets. Advances in neural information processing systems
Goodfellow, I. J., Shlens, J., Szegedy, C. (2015). Explaining and harnessing adversarial examples. International conference on learning representations
Jin, C., Netrapalli, P. and Jordan, M. (2020). What is local optimality in nonconvex-nonconcave minimax optimization? In International Conference on Machine Learning (pp. 4880-4889). PMLR
Kong, W. and Monteiro, R. D. (2019). An accelerated inexact proximal point method for solving nonconvex-concave min-max problems. arXiv preprint http://arxiv.org/abs/1905.13433arXiv:1905.13433
Lewis, A., Tian, T.: The structure of conservative gradient fields. SIAM Journal on Optimization 31(3), 2080–2083 (2021)
Lin, T., Jin, C. and Jordan, M. (2020). On gradient descent ascent for nonconvex-concave minimax problems. In International Conference on Machine Learning (pp. 6083-6093). PMLR
Ostrovskii, D.M., Lowy, A., Razaviyayn, M.: Efficient search of first-order nash equilibria in nonconvex-concave smooth min-max problems. SIAM Journal on Optimization 31(4), 2508–2538 (2021)
Rafique, H., Liu, M., Lin, Q. and Yang, T. (2021). Weakly-convex-concave min-max optimization: provable algorithms and applications in machine learning. Optimization Methods and Software, 1-35
Rios-Zertuche R. (2020). Examples of pathological dynamics of the subgradient method for Lipschitz path-differentiable functions. arXiv preprint http://arxiv.org/abs/2007.11699arXiv:2007.11699
Rockafellar, R.T.: Extensions of subgradient calculus with applications to optimization. Nonlinear Analysis: Theory, Methods & Applications 9(7), 665–698 (1985)
Rockafellar R. T. and Wets R. J. B. (1998). Variational analysis (Vol. 317). Springer Science & Business Media
H. Royden, P. Fitzpatrick (2010) Real Analysis Prentice Hall
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R. (2014). Intriguing properties of neural networks. International Conference on Learning Representations
Thekumparampil, K. K., Jain, P., Netrapalli, P. and Oh, S. (2019). Efficient algorithms for smooth minimax optimization. arXiv preprint http://arxiv.org/abs/1907.01543arXiv:1907.01543
Valadier, M.: Entraînement unilatéral, lignes de descente, fonctions lipschitziennes non pathologiques. Comptes rendus de l’Académie des Sciences 308, 241–244 (1989)
Wang X. (1995). Pathological Lipschitz functions in \(\mathbb{R}_n\). Master Thesis, Simon Fraser University
Wang Y. and Zhang G. and Ba J. (2020) On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach International Conference on Learning Representations
Whitney, H.: A function not constant on a connected set of critical points. Duke Mathematical Journal 1(4), 514–517 (1935)
Wilkie, A.J.: A theorem of the complement and some new o-minimal structures. Selecta Mathematica 5(4), 397–421 (1999)