Discrimination between Gaussian process models: active learning and static constructions
Tóm tắt
The paper covers the design and analysis of experiments to discriminate between two Gaussian process models with different covariance kernels, such as those widely used in computer experiments, kriging, sensor location and machine learning. Two frameworks are considered. First, we study sequential constructions, where successive design (observation) points are selected, either as additional points to an existing design or from the beginning of observation. The selection relies on the maximisation of the difference between the symmetric Kullback Leibler divergences for the two models, which depends on the observations, or on the mean squared error of both models, which does not. Then, we consider static criteria, such as the familiar log-likelihood ratios and the Fréchet distance between the covariance functions of the two models. Other distance-based criteria, simpler to compute than previous ones, are also introduced, for which, considering the framework of approximate design, a necessary condition for the optimality of a design measure is provided. The paper includes a study of the mathematical links between different criteria and numerical illustrations are provided.
Tài liệu tham khảo
Atkinson AC, Fedorov VV (1975) The design of experiments for discriminating between two rival models. Biometrika 62(1):57–70. https://doi.org/10.1093/biomet/62.1.57
Box GEP, Hill WJ (1967) Discrimination among mechanistic models. Technometrics 9(1):57–71. https://doi.org/10.2307/1266318
Damianou A, Lawrence ND (2013) Deep Gaussian Processes. In: Proceedings of the sixteenth international conference on artificial intelligence and statistics. PMLR, pp 207–215. https://proceedings.mlr.press/v31/damianou13a.html
Dowson DC, Landau BV (1982) The Fréchet distance between multivariate normal distributions. J Multivar Anal 12(3):450–455. https://doi.org/10.1016/0047-259X(82)90077-X
Fedorov VV (1971) The design of experiments in the multiresponse case. Theory Probab Appl 16(2):323–332
Gramacy RB (2020) Surrogates: Gaussian process modeling, design, and optimization for the applied sciences. Chapman and Hall/CRC, Boca Raton. https://doi.org/10.1201/9780367815493
Heirung TAN, Santos TLM, Mesbah A (2019) Model predictive control with active learning for stochastic systems with structural model uncertainty: online model discrimination. Comput Chem Eng 128:128–140. https://doi.org/10.1016/j.compchemeng.2019.05.012
Hershey JR, Olsen PA (2007) Approximating the Kullback Leibler divergence between Gaussian mixture models. In: 2007 IEEE international conference on acoustics, speech and signal processing—ICASSP ’07, pp IV–317–IV–320, https://doi.org/10.1109/ICASSP.2007.366913
Hill WJ, Hunter WG (1969) A note on designs for model discrimination: variance unknown case. Technometrics 11(2):396–400. https://doi.org/10.1080/00401706.1969.10490695
Hino H (2020) Active learning: problem settings and recent developments. arxiv:2012.04225
Hoffmann C (2017) Numerical aspects of uncertainty in the design of optimal experiments for model discrimination. PhD thesis, Ruprecht-Karls-Universität Heidelberg. https://doi.org/10.11588/heidok.00022612
Hunter W, Reiner A (1965) Designs for discriminating between two rival models. Technometrics 7(3):307–323
Johnson SG (2021) The NLopt nonlinear-optimization package. http://github.com/stevengj/nlopt
Karvonen T (2022) Asymptotic bounds for smoothness parameter estimates in Gaussian process interpolation. arxiv:2203.05400
Karvonen T, Oates C (2022) Maximum likelihood estimation in Gaussian process regression is ill-posed. arxiv:2203.09179
Karvonen T, Wynne G, Tronarp F et al (2020) Maximum likelihood estimation and uncertainty quantification for Gaussian process approximation of deterministic functions. SIAM/ASA J Uncertain Quantif 8(3):926–958. https://doi.org/10.1137/20M1315968
Kiefer J (1974) General equivalence theory for optimum designs (approximate theory). Ann Stat 2(5):849–879. https://doi.org/10.1214/aos/1176342810
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86. https://doi.org/10.1214/aoms/1177729694
Lee XJ, Hainy M, McKeone JP et al (2018) ABC model selection for spatial extremes models applied to South Australian maximum temperature data. Comput Stat Data Anal 128:128–144. https://doi.org/10.1016/j.csda.2018.06.019
López-Fidalgo J, Tommasi C, Trandafir PC (2007) An optimal experimental design criterion for discriminating between non-normal models. J R Stat Soc 69(2):231–242
Motzkin TS, Straus EG (1965) Maxima for graphs and a new proof of a theorem of Turán. Can J Math 17:533–540. https://doi.org/10.4153/CJM-1965-053-6
Müller WG (2007) Collecting spatial data: optimum design of experiments for random fields, 3rd edn. Springer, Berlin
Olofsson S, Deisenroth MP, Misener R (2018) Design of experiments for model discrimination using Gaussian process surrogate models. In: Eden MR, Ierapetritou MG, Towler GP (eds) 13th International symposium on process systems engineering (PSE 2018), computer aided chemical engineering, vol 44. Elsevier, pp 847–852, https://doi.org/10.1016/B978-0-444-64241-7.50136-1
Pronzato L, Wynn HP, Zhigljavsky A (2019) Bregman divergences based on optimal design criteria and simplicial measures of dispersion. Stat Pap 60(2):545–564. https://doi.org/10.1007/s00362-018-01082-8
Sauer A, Gramacy RB, Higdon D (2022) Active learning for deep Gaussian process surrogates. Technometrics. https://doi.org/10.1080/00401706.2021.2008505
Schwaab M, Luiz Monteiro J, Carlos Pinto J (2008) Sequential experimental design for model discrimination: taking into account the posterior covariance matrix of differences between model predictions. Chem Eng Sci 63(9):2408–2419. https://doi.org/10.1016/j.ces.2008.01.032
Stein M (1999) Interpolation of spatial data: some theory for kriging. Springer series in statistics. Springer, Heidelberg
Wynn HP (1970) The sequential generation of \( D \)-optimum experimental designs. Ann Math Stat 41(5):1655–1664