Discrimination between Gaussian process models: active learning and static constructions

Statistische Hefte - Tập 64 - Trang 1275-1304 - 2023
Elham Yousefi1, Luc Pronzato2, Markus Hainy1, Werner G. Müller1, Henry P. Wynn3
1Institute of Applied Statistics, Johannes Kepler University, Linz, Austria
2Université Côte d’Azur, CNRS, Laboratoire I3S - UMR 7271, Sophia Antipolis, France
3Department of Statistics, London School of Economics, London, UK

Tóm tắt

The paper covers the design and analysis of experiments to discriminate between two Gaussian process models with different covariance kernels, such as those widely used in computer experiments, kriging, sensor location and machine learning. Two frameworks are considered. First, we study sequential constructions, where successive design (observation) points are selected, either as additional points to an existing design or from the beginning of observation. The selection relies on the maximisation of the difference between the symmetric Kullback Leibler divergences for the two models, which depends on the observations, or on the mean squared error of both models, which does not. Then, we consider static criteria, such as the familiar log-likelihood ratios and the Fréchet distance between the covariance functions of the two models. Other distance-based criteria, simpler to compute than previous ones, are also introduced, for which, considering the framework of approximate design, a necessary condition for the optimality of a design measure is provided. The paper includes a study of the mathematical links between different criteria and numerical illustrations are provided.

Tài liệu tham khảo

Atkinson AC, Fedorov VV (1975) The design of experiments for discriminating between two rival models. Biometrika 62(1):57–70. https://doi.org/10.1093/biomet/62.1.57 Box GEP, Hill WJ (1967) Discrimination among mechanistic models. Technometrics 9(1):57–71. https://doi.org/10.2307/1266318 Damianou A, Lawrence ND (2013) Deep Gaussian Processes. In: Proceedings of the sixteenth international conference on artificial intelligence and statistics. PMLR, pp 207–215. https://proceedings.mlr.press/v31/damianou13a.html Dowson DC, Landau BV (1982) The Fréchet distance between multivariate normal distributions. J Multivar Anal 12(3):450–455. https://doi.org/10.1016/0047-259X(82)90077-X Fedorov VV (1971) The design of experiments in the multiresponse case. Theory Probab Appl 16(2):323–332 Gramacy RB (2020) Surrogates: Gaussian process modeling, design, and optimization for the applied sciences. Chapman and Hall/CRC, Boca Raton. https://doi.org/10.1201/9780367815493 Heirung TAN, Santos TLM, Mesbah A (2019) Model predictive control with active learning for stochastic systems with structural model uncertainty: online model discrimination. Comput Chem Eng 128:128–140. https://doi.org/10.1016/j.compchemeng.2019.05.012 Hershey JR, Olsen PA (2007) Approximating the Kullback Leibler divergence between Gaussian mixture models. In: 2007 IEEE international conference on acoustics, speech and signal processing—ICASSP ’07, pp IV–317–IV–320, https://doi.org/10.1109/ICASSP.2007.366913 Hill WJ, Hunter WG (1969) A note on designs for model discrimination: variance unknown case. Technometrics 11(2):396–400. https://doi.org/10.1080/00401706.1969.10490695 Hino H (2020) Active learning: problem settings and recent developments. arxiv:2012.04225 Hoffmann C (2017) Numerical aspects of uncertainty in the design of optimal experiments for model discrimination. PhD thesis, Ruprecht-Karls-Universität Heidelberg. https://doi.org/10.11588/heidok.00022612 Hunter W, Reiner A (1965) Designs for discriminating between two rival models. Technometrics 7(3):307–323 Johnson SG (2021) The NLopt nonlinear-optimization package. http://github.com/stevengj/nlopt Karvonen T (2022) Asymptotic bounds for smoothness parameter estimates in Gaussian process interpolation. arxiv:2203.05400 Karvonen T, Oates C (2022) Maximum likelihood estimation in Gaussian process regression is ill-posed. arxiv:2203.09179 Karvonen T, Wynne G, Tronarp F et al (2020) Maximum likelihood estimation and uncertainty quantification for Gaussian process approximation of deterministic functions. SIAM/ASA J Uncertain Quantif 8(3):926–958. https://doi.org/10.1137/20M1315968 Kiefer J (1974) General equivalence theory for optimum designs (approximate theory). Ann Stat 2(5):849–879. https://doi.org/10.1214/aos/1176342810 Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86. https://doi.org/10.1214/aoms/1177729694 Lee XJ, Hainy M, McKeone JP et al (2018) ABC model selection for spatial extremes models applied to South Australian maximum temperature data. Comput Stat Data Anal 128:128–144. https://doi.org/10.1016/j.csda.2018.06.019 López-Fidalgo J, Tommasi C, Trandafir PC (2007) An optimal experimental design criterion for discriminating between non-normal models. J R Stat Soc 69(2):231–242 Motzkin TS, Straus EG (1965) Maxima for graphs and a new proof of a theorem of Turán. Can J Math 17:533–540. https://doi.org/10.4153/CJM-1965-053-6 Müller WG (2007) Collecting spatial data: optimum design of experiments for random fields, 3rd edn. Springer, Berlin Olofsson S, Deisenroth MP, Misener R (2018) Design of experiments for model discrimination using Gaussian process surrogate models. In: Eden MR, Ierapetritou MG, Towler GP (eds) 13th International symposium on process systems engineering (PSE 2018), computer aided chemical engineering, vol 44. Elsevier, pp 847–852, https://doi.org/10.1016/B978-0-444-64241-7.50136-1 Pronzato L, Wynn HP, Zhigljavsky A (2019) Bregman divergences based on optimal design criteria and simplicial measures of dispersion. Stat Pap 60(2):545–564. https://doi.org/10.1007/s00362-018-01082-8 Sauer A, Gramacy RB, Higdon D (2022) Active learning for deep Gaussian process surrogates. Technometrics. https://doi.org/10.1080/00401706.2021.2008505 Schwaab M, Luiz Monteiro J, Carlos Pinto J (2008) Sequential experimental design for model discrimination: taking into account the posterior covariance matrix of differences between model predictions. Chem Eng Sci 63(9):2408–2419. https://doi.org/10.1016/j.ces.2008.01.032 Stein M (1999) Interpolation of spatial data: some theory for kriging. Springer series in statistics. Springer, Heidelberg Wynn HP (1970) The sequential generation of \( D \)-optimum experimental designs. Ann Math Stat 41(5):1655–1664