Secant penalized BFGS: a noise robust quasi-Newton method via penalizing the secant condition
Tóm tắt
In this paper, we introduce a new variant of the BFGS method designed to perform well when gradient measurements are corrupted by noise. We show that treating the secant condition with a penalty method approach motivated by regularized least squares estimation generates a parametric family with the original BFGS update at one extreme and not updating the inverse Hessian approximation at the other extreme. Furthermore, we find the curvature condition is relaxed as the family moves towards not updating the inverse Hessian approximation, and disappears entirely at the extreme where the inverse Hessian approximation is not updated. These developments allow us to develop a method we refer to as Secant Penalized BFGS (SP-BFGS) that allows one to relax the secant condition based on the amount of noise in the gradient measurements. SP-BFGS provides a means of incrementally updating the new inverse Hessian approximation with a controlled amount of bias towards the previous inverse Hessian approximation, which allows one to replace the overwriting nature of the original BFGS update with an averaging nature that resists the destructive effects of noise and can cope with negative curvature measurements. We discuss the theoretical properties of SP-BFGS, including convergence when minimizing strongly convex functions in the presence of uniformly bounded noise. Finally, we present extensive numerical experiments using over 30 problems from the CUTEst test problem set that demonstrate the superior performance of SP-BFGS compared to BFGS in the presence of both noisy function and gradient evaluations.
Tài liệu tham khảo
Aydin, L., Aydin, O., Artem, H.S., Mert, A.: Design of dimensionally stable composites using efficient global optimization method. Proc. Inst. Mech. Eng. Part L: J. Mater. Design Appl. 233(2), 156–168 (2019). https://doi.org/10.1177/1464420716664921
Berahas, A.S., Byrd, R.H., Nocedal, J.: Derivative-free optimization of noisy functions via quasi-newton methods. SIAM J. Optim. 29, 965–993 (2019). https://doi.org/10.1137/18M1177718
Besançon, M., Anthoff, D., Arslan, A., Byrne, S., Lin, D., Papamarkou, T., Pearson, J.: Distributions.jl: Definition and modeling of probability distributions in the juliastats ecosystem. arXiv e-prints arXiv:1907.08611 (2019)
Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: a fresh approach to numerical computing. SIAM Rev. 59(1), 65–98 (2017). https://doi.org/10.1137/141000671
Bons, N.P., He, X., Mader, C.A., Martins, J.R.R.A.: Multimodality in aerodynamic wing design optimization. AIAA J. 57(3), 1004–1018 (2019). https://doi.org/10.2514/1.J057294
Broyden, C.G.: The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA J. Appl. Math. 6(1), 76–90 (1970). https://doi.org/10.1093/imamat/6.1.76
Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-newton method for large-scale optimization. SIAM J. Optim. 26(2), 1008–1031 (2016). https://doi.org/10.1137/140954362
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190–1208 (1995). https://doi.org/10.1137/0916069
Byrd, R.H., Nocedal, J.: A tool for the analysis of quasi-newton methods with application to unconstrained minimization. SIAM J. Numer. Anal. 26(3), 727–739 (1989). https://doi.org/10.2307/2157680
Byrd, R.H., Nocedal, J., Yuan, Y.X.: Global convergence of a class of quasi-newton methods on convex problems. SIAM J. Numer. Anal. 24(5), 1171–1190 (1987). https://doi.org/10.2307/2157646
Chang, D., Sun, S., Zhang, C.: An accelerated linearly convergent stochastic L-BFGS algorithm. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3338–3346 (2019). https://doi.org/10.1109/TNNLS.2019.2891088
Fasano, G., Pintér, J.D.: Modeling and Optimization in Space Engineering: State of the Art and New Challenges. Springer (2019). https://doi.org/10.1007/978-1-4614-4469-5
Fletcher, R.: A new approach to variable metric algorithms. Comput. J. 13(3), 317–322 (1970). https://doi.org/10.1093/comjnl/13.3.317
Gal, R., Haber, E., Irwin, B., Saleh, B., Ziv, A.: How to catch a lion in the desert: on the solution of the coverage directed generation (CDG) problem. Optim. Eng. 22, 217–245 (2021). https://doi.org/10.1007/s11081-020-09507-w
Goldfarb, D.: A family of variable-metric methods derived by variational means. Math. Comput. 24(109), 23–26 (1970). https://doi.org/10.2307/2004873
Gould, N.I.M., Orban, D., contributors: The Constrained and Unconstrained Testing Environment with safe threads (CUTEst) for optimization software. https://github.com/ralna/CUTEst (2019)
Gould, N.I.M., Orban, D., Toint, P.L.: CUTEr a Constrained and Unconstrained Testing Environment, revisited. https://www.cuter.rl.ac.uk (2001)
Gould, N.I.M., Orban, D., Toint, P.L.: CUTEr and SifDec: a constrained and unconstrained testing environment, revisited. ACM Trans. Math. Softw. 29(4), 373–394 (2003). https://doi.org/10.1145/962437.962439
Gould, N.I.M., Orban, D., Toint, P.L.: CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Comput. Optim. Appl. 60(3), 545–557 (2015). https://doi.org/10.1007/s10589-014-9687-3
Gower, R., Goldfarb, D., Richtarik, P.: Stochastic block BFGS: squeezing more curvature out of data. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 48, pp. 1869–1878. PMLR, New York, New York, USA (2016). http://proceedings.mlr.press/v48/gower16.html
Graf, P.A., Billups, S.: MDTri: robust and efficient global mixed integer search of spaces of multiple ternary alloys. Comput. Optim. Appl. 68(3), 671–687 (2017). https://doi.org/10.1007/s10589-017-9922-9
Güler, O., Gürtuna, F., Shevchenko, O.: Duality in quasi-newton methods and new variational characterizations of the DFP and BFGS updates. Optim. Methods Softw. 24(1), 45–62 (2009). https://doi.org/10.1080/10556780802367205
Hager, W.W.: Updating the inverse of a matrix. SIAM Review 31(2), 221–239 (1989). https://doi.org/10.2307/2030425
Horn, R.A., Johnson, C.R.: Matrix Analysis, 2nd edn. Cambridge University Press, New York (2013). https://doi.org/10.1017/CBO9781139020411
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013). https://doi.org/10.5555/2999611.2999647
Johnson, S.G.: Quasi-newton optimization: Origin of the BFGS update (2019). https://ocw.mit.edu/courses/mathematics/18-335j-introduction-to-numerical-methods-spring-2019/week-11/MIT18_335JS19_lec30.pdf
Keane, A.J., Nair, P.B.: Computational Approaches for Aerospace Design: The Pursuit of Excellence. Wiley (2005). https://doi.org/10.1002/0470855487
Kelley, C.: Implicit Filtering. SIAM, Philadelphia (2011). https://doi.org/10.1137/1.9781611971903
Koziel, S., Ogurtsov, S.: Antenna Design by Simulation-Driven Optimization. Springer (2014). https://doi.org/10.1007/978-3-319-04367-8
Lewis, A.S., Overton, M.L.: Nonsmooth optimization via quasi-newton methods. Math. Program. 141, 135–163 (2013). https://doi.org/10.1007/s10107-012-0514-2
Lin, D., White, J.M., Byrne, S., Bates, D., Noack, A., Pearson, J., Arslan, A., Squire, K., Anthoff, D., Papamarkou, T., Besançon, M., Drugowitsch, J., Schauer, M., other contributors: JuliaStats/Distributions.jl: a Julia package for probability distributions and associated functions. https://github.com/JuliaStats/Distributions.jl (2019). https://doi.org/10.5281/zenodo.2647458
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1), 503–528 (1989). https://doi.org/10.1007/BF01589116
Mokhtari, A., Ribeiro, A.: Global convergence of online limited memory BFGS. J. Mach. Learn. Res. 16(1), 3151–3181 (2015). https://doi.org/10.5555/2789272.2912100
Moritz, P., Nishihara, R., Jordan, M.: A linearly-convergent stochastic L-BFGS algorithm. In: Gretton, A., Robert, C.C. (eds.) Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 51, pp. 249–258. PMLR, Cadiz, Spain (2016). http://proceedings.mlr.press/v51/moritz16.html
Muñoz-Rojas, P.A.: Computational Modeling, Optimization and Manufacturing Simulation of Advanced Engineering Materials. Springer (2016). https://doi.org/10.1007/978-3-319-04265-7
Nocedal, J., Wright, S.: Numerical Optimization. Springer, New York (2006). https://doi.org/10.1007/978-0-387-40065-5
Orban, D., Siqueira, A.S., contributors: CUTEst.jl: Julia’s CUTEst interface. https://github.com/JuliaSmoothOptimizers/CUTEst.jl (2020). https://doi.org/10.5281/zenodo.1188851
Orban, D., Siqueira, A.S., contributors: NLPModels.jl: Data structures for optimization models. https://github.com/JuliaSmoothOptimizers/NLPModels.jl (2020). https://doi.org/10.5281/zenodo.2558627
Powell, M.J.D.: Algorithms for nonlinear constraints that use lagrangian functions. Math. Program. 14(1), 224–248 (1978). https://doi.org/10.1007/BF01588967
Rosenbrock, H.H.: An automatic method for finding the greatest or least value of a function. Comput. J. 3(3), 175–184 (1960). https://doi.org/10.1093/comjnl/3.3.175
Schraudolph, N.N., Yu, J., Günter, S.: A stochastic quasi-newton method for online convex optimization. In: Meila, M., Shen, X. (eds.) Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 2, pp. 436–443. PMLR, San Juan, Puerto Rico (2007). http://proceedings.mlr.press/v2/schraudolph07a.html
Shanno, D.F.: Conditioning of quasi-newton methods for function minimization. Math. Comput. 24(111), 647–656 (1970). https://doi.org/10.2307/2004840
Shi, H.J.M., Xie, Y., Byrd, R., Nocedal, J.: A noise-tolerant quasi-newton algorithm for unconstrained optimization. SIAM J. Optim. 32(1), 29–55 (2022). https://doi.org/10.1137/20M1373190
Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM J. Optim. 27(2), 927–956 (2017). https://doi.org/10.1137/15M1053141
Xie, Y., Byrd, R.H., Nocedal, J.: Analysis of the BFGS method with errors. SIAM J. Optim. 30(1), 182–209 (2020). https://doi.org/10.1137/19M1240794
Zhao, R., Haskell, W.B., Tan, V.Y.F.: Stochastic L-BFGS: improved convergence rates and practical acceleration strategies. IEEE Trans. Signal Process. 66, 1155–1169 (2018). https://doi.org/10.1109/TSP.2017.2784360
Zhu, J.: Optimization of Power System Operation. Wiley (2008). https://doi.org/10.1002/9780470466971