Optimal learning with a local parametric belief model
Tóm tắt
We are interested in maximizing smooth functions where observations are noisy and expensive to compute, as might arise in computer simulations or laboratory experimentations. We derive a knowledge gradient policy, which chooses measurements which maximize the expected value of information, while using a locally parametric belief model that uses linear approximations with radial basis functions. The method uses a compact representation of the function which avoids storing the entire history, as is typically required by nonparametric methods. Our technique uses the expected value of a measurement in terms of its ability to improve our estimate of the optimum, capturing correlations in our beliefs about neighboring regions of the function, without posing any assumptions on the global shape of the underlying function a priori. Experimental work suggests that the method adapts to a range of arbitrary, continuous functions, and appears to reliably find the optimal solution. Moreover, the policy is shown to be asymptotically optimal.
Tài liệu tham khảo
Agrawal, R.: The continuum-armed bandit problem. SIAM J. Control Optim. 33(6), 1926–1951 (1995)
Barton, R.R., Meckesheimer, M.: Chapter 18: Metamodel-based simulation optimization. In: Henderson, S.G., Nelson, B.L. (eds.) Simulation, Handbooks in Operations Research and Management Science, vol. 13, pp. 535–574. Elsevier, Amsterdam (2006)
Barut, E., Powell, W.B.: Optimal learning for sequential sampling with non-parametric beliefs. J. Glob. Optim. 58(3), 517–543 (2014). doi:10.1007/s10898-013-0050-5
Branin, F.H.: Widely convergent method for finding multiple solutions of simultaneous nonlinear equations. IBM J. Res. Dev. 16(5), 504–522 (1972)
Calvin, J., Žilinskas, A.: One-dimensional global optimization for observations with noise. Comput. Math. Appl. 50(1–2), 157–169 (2005). doi:10.1016/j.camwa.2004.12.014. Cited by 8
Chang, F., Lai, T.L.: Optimal stopping and dynamic allocation. Adv. Appl. Probab. 19(4), 829–853 (1987)
Chick, S.E., Gans, N.: Economic analysis of simulation selection problems. Manag. Sci. 55(3), 421–437 (2009)
Cochran, W., Cox, G.: Experimental Deisgns. Wiley, New York (1957)
Cohn, D.A., Ghahramani, Z., Jordan, M.I.: Active learning with statistical models. J. Artif. Intell. Res. 4, 129–145 (1996)
Eubank, R.L.: Spline Smoothing and Nonparametric Regression, 2nd edn. Marcel Dekker, New York (1988)
Fan, J., Gijbels, I.: Local Polynomial Modelling and Its Applications. Monographs on Statistics and Applied Probability Series. Chapman and Hall/CRC, London (1996)
Frazier, P.I., Powell, W.B., Dayanik, S.: A knowledge-gradient policy for sequential information collection. SIAM J. Control Optim. 47(5), 2410–2439 (2008)
Frazier, P.I., Powell, W.B., Dayanik, S.: The knowledge-gradient policy for correlated normal beliefs. INFORMS J. Comput. 21(4), 599–613 (2009)
Fu, M.C.: Chapter 19: Gradient estimation. In: Henderson, S.G., Nelson, B.L. (eds.) Simulation, Handbooks in Operations Research and Management Science, vol. 13, pp. 575–616. Elsevier, Amsterdam (2006)
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, 2nd edn. Chapman and Hall/CRC, London (2004)
Gibbs, M.N.: Bayesian Gaussian processes for regression and classification. Ph.D. thesis, University of Cambridge (1997)
Ginebra, J., Clayton, M.K.: Response surface bandits. J. R. Stat. Soc. Ser. B (Methodol.) 57(4), 771–784 (1995)
Gittins, J.C.: Bandit processes and dynamic allocation indices. J. R. Stat. Soc. Ser. B (Methodol.) 41(2), 148–177 (1979)
Gupta, S.S., Miescke, K.J.: Bayesian look ahead one-stage sampling allocations for selection of the best population. J. Stat. Plan. Inference 54(2), 229–244 (1996)
Hannah, L.A., Blei, D.M., Powell, W.B.: Dirichlet process mixtures of generalized linear models. J. Mach. Learn. Res. 12, 1923–1953 (2011)
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River (1998)
Huang, D., Allen, T., Notz, W., Zeng, N.: Global optimization of stochastic black-box systems via sequential kriging meta-models. J. Glob. Optim. 34(3), 441–466 (2006)
Jamshidi, A.A., Kirby, M.J.: Towards a black box algorithm for nonlinear function approximation over high-dimensional domains. SIAM J. Sci. Comput. 29(3), 941–964 (2007)
Jamshidi, A.A., Powell, W.B.: A recursive local polynomial approximation method using dirichlet clouds and radial basis functions. Working Paper. Princeton University, New Jersey (2013)
Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Glob. Optim. 13(4), 455–492 (1998). doi:10.1023/A:1008306431147
Jones, R., Lee, Y.C., Barnes, C.W., Flake, G., Lee, K., Lewis, P., Qian, S.: Function approximation and time series prediction with neural networks. In: IJCNN International Joint Conference on Neural Networks, vol. 1, pp. 649–665 (1990)
Knuth, D.E.: The Art of Computer Programming, vol. 2: Seminumerical Algorithms, 3rd edn. Addison-Wesley, Boston (1998)
Kushner, H.J.: A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. J. Fluids Eng. 86(1), 97–106 (1964). doi:10.1115/1.3653121
Mes, M.R., Powell, W.B., Frazier, P.I.: Hierarchical knowledge gradient for sequential sampling. J. Mach. Learn. Res. 12, 2931–2974 (2011)
Montgomery, D.C., Peck, E.A., Vining, G.G.: Introduction to Linear Regression Analysis, 4th edn. Wiley, Hoboken (2006)
Muller, H.G.: Weighted local regression and kernel methods for nonparametric curve fitting. J. Am. Stat. Assoc. 82(397), 231–238 (1987)
Negoescu, D.M., Frazier, P.I., Powell, W.B.: The knowledge-gradient algorithm for sequencing experiments in drug discovery. INFORMS J. Comput. 23(3), 346–363 (2010)
Nelson, B.L., Swann, J., Goldsman, D., Song, W.: Simple procedures for selecting the best simulated system when the number of alternatives is large. Oper. Res. 49, 950–963 (1999)
Powell, W.B., Ryzhov, I.O.: Optimal Learning. Wiley, London (2012)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Ryzhov, I.O., Powell, W.B., Frazier, P.: The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1), 180–195 (2012)
Scott, W., Frazier, P., Powell, W.B.: The correlated knowledge gradient for simulation optimization of continuous parameters using Gaussian process regression. SIAM J. Optim. 21(3), 996–1026 (2011)
Spall, J.C.: Introduction to Stochastic Search and Optimization, 1st edn. Wiley, New York (2003)
Villemonteix, J., Vazquez, E., Walter, E.: An informational approach to the global optimization of expensive-to-evaluate functions. J. Glob. Optim. 44(4), 509–534 (2009)
Žilinskas, A.: Two algorithms for one-dimensional multimodai minimization. Math. Oper. Stat. Ser. Optim. 12(1), 53–63 (1981). doi:10.1080/02331938108842707