Bayesian nonstationary Gaussian process models via treed process convolutions

Advances in Data Analysis and Classification - Tập 13 - Trang 797-818 - 2018
Waley W. J. Liang1, Herbert K. H. Lee1
1Department of Applied Mathematics and Statistics, University of California, Santa Cruz, USA

Tóm tắt

The Gaussian process is a common model in a wide variety of applications, such as environmental modeling, computer experiments, and geology. Two major challenges often arise: First, assuming that the process of interest is stationary over the entire domain often proves to be untenable. Second, the traditional Gaussian process model formulation is computationally inefficient for large datasets. In this paper, we propose a new Gaussian process model to tackle these problems based on the convolution of a smoothing kernel with a partitioned latent process. Nonstationarity can be modeled by allowing a separate latent process for each partition, which approximates a regional clustering structure. Partitioning follows a binary tree generating process similar to that of Classification and Regression Trees. A Bayesian approach is used to estimate the partitioning structure and model parameters simultaneously. Our motivating dataset consists of 11918 precipitation anomalies. Results show that our model has promising prediction performance and is computationally efficient for large datasets.

Tài liệu tham khảo

Analytics R, Weston S (2015a) doParallel: Foreach parallel adaptor for the “parallel” package. http://CRAN.R-project.org/package=doParallel, R package version 1.0.10 Analytics R, Weston S (2015b) foreach: Provides Foreach looping construct for R. http://CRAN.R-project.org/package=foreach, R package version 1.4.3 Banerjee S, Gelfand AE, Finley AO, Sang H (2008) Gaussian predictive process models for large spatial data sets. J R Stat Soc Ser B 70(4):825–848 Bornn L, Shaddick G, Zidek J (2012) Modelling nonstationary processes through dimension expansion. J Am Stat Assoc 107(497):281–289 Breiman L, Friedman JH, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth, Belmont Brenning A (2001) Geostatistics without stationarity assumptions within geographical information systems. Freiberg Online Geosci 6:1–108 Chipman HA, George EI, McCulloch RE (1998) Bayesian CART model search. J Am Stat Assoc 93(443):935–948 Cressie N, Johannesson G (2008) Fixed rank kriging for very large spatial data sets. J R Stat Soc Ser B 70(Part 1):209–226 Damian D, Sampson P, Guttorp P (2001) Bayesian estimation of semi-parametric non-stationary spatial covariance structure. Environmetrics 12:161–178 Finley AO, Banerjee S, Carlin BP (2007) spBayes: an R package for univariate and multivariate hierarchical point-referenced spatial models. J Stat Softw 19(4):1–24 http://www.jstatsoft.org/article/view/v019i04 Finley AO, Sang H, Banerjee S, Gelfand AE (2009) Improving the performance of predictive process modeling for large datasets. Comput Stat Data Anal 53:2873–2884 Fuentes M, Smith RL (2001) A new class of nonstationary spatial models. Technical reports on North Carolina State University, Department of Statistics, Raleigh, NC Fuentes M, Kelly R, Kittel T, Nychka D (1998) Spatial prediction of climate fields for ecological models. Technical reports on National Center for Atmospheric Research, Boulder CO Furrer R (2006) KriSp: an R package for covariance tapered kriging of large datasets using sparse matrix techniques. In: Technical reports on MCS 06-06, Colorado School of Mines, Golden, USA, http://user.math.uzh.ch/furrer/software/KriSp/, version 0.4, 2006–10–26 Gaujoux R (2014) doRNG: generic reproducible parallel backend for “foreach” loops. http://CRAN.R-project.org/package=doRNG, R package version 1.6 Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Anal Mach Intell 12:609–628 Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102:359–378 Gramacy RB (2007) tgp: an R package for Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian process models. J Stat Softw 19(9):1–46. http://www.jstatsoft.org/v19/i09/ Gramacy RB, Apley DW (2015) Local Gaussian process approximation for large computer experiments. J Comput Graph Stat 24(2):561–578 Gramacy RB, Lee HK (2008) Bayesian treed Gaussian process models with an application to computer modeling. J Am Stat Assoc 103(483):1119–1130 Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4):711–32 Higdon D (1998) A process-convolution approach to modeling temperatures in the north Atlantic Ocean. J Environ Ecol Stat 5(2):173–190 Higdon D (2002) Space and space-time modeling using process convolutions. In: Anderson C, Barnett V, Chatwin P, El-Shaarawi A (eds) Quantitative methods for current environmental issues. Springer, London, pp 37–54 Higdon D (2006) A primer on space-time modeling from a Bayesian perspective. In: Finkenstadt B, Held L, Isham V (eds) Statistical methods of spatio-temporal systems. Chapman and Hall/CRC, Boca Raton, pp 217–279 Higdon D, Swall J, Kern J (1999) Non-stationary spatial modeling. Bayesian Stat 6:761–768 Johns CJ, Nychka D, Kittel TG, Daly C (2003) Infilling sparse records of spatial fields. J Am Stat Assoc 98:796–806 Katzfuss M (2013) Bayesian nonstationary spatial modeling for very large datasets. Environmetrics 24(3):189–200 Kim HM, Mallick BK, Holmes CC (2005) Analyzing nonstationary spatial data using piecewise Gaussian processes. J Am Stat Assoc 100:653–668 Konomi BA, Sang H, Mallick BK (2014) Adaptive Bayesian nonstationary modeling for large spatial datasets using covariance approximations. J Comput Graph Stat 23(3):802–829 Lee HKH, Higdon D, Calder CA, Holloman CH (2005) Efficient models for correlated data via convolutions of intrinsic processes. Stat Model 5(1):53–74 Lemos RT, Sansó B (2009) Spatio-temporal model for mean, anomaly and trend fields of north atlantic sea surface temperature. J Am Stat Assoc 104(485):5–18 Liang WWJ (2012) Bayesian nonstationary Gaussian process models via treed process convolutions. Ph.D. Thesis, Department of AMS, UCSC, Santa Cruz, 95064 Montagna S (2013) On Bayesian analyses of functional regression, correlated functional data and non-homogeneous computer models. Ph.D. Thesis, Duke University, Durham, NC 27708 Naish-Guzman A, Holden S (2007) The generalized FITC approximation. In: Advances in neural information processing systems, pp 1057–1064 Paciorek C, Schervish MJ (2006) Spatial modelling using a new class of nonstationary covariance functions. Environmetrics 17:483–506 Sampson P, Guttorp P (1992) Nonparametric estimation of nonstationary spatial covariance structure. J Am Stat Assoc 87:108–119 Sang H, Huang JZ (2012) A full scale approximation of covariance functions for large spatial data sets. J R Stat Soc Ser B 74(22):111–132 Schmidt A, O’Hagan A (2003) Bayesian inference for non-stationary spatial covariance structure via spatial deformations. J R Stat Soc Ser B 65:743–758 Snelson E, Ghahramani Z (2005) Sparse Gaussian processes using pseudo-inputs. In: Advances in neural information processing systems, 18 Taddy MA, Gramacy RB, Polson NG (2011) Dynamic trees for learning and design. J Am Stat Assoc 106(493):109–123 van Dyk DA, Park T (2008) Partially collapsed Gibbs samplers: theory and methods. J Am Stat Assoc 103(482):790–796 Yang H, Liu F, Ji C, Dunson D (2014) Adaptive sampling for Bayesian geospatial models. Stat Comput 24:1101–1110