Model-Based Geostatistics Under Spatially Varying Preferential Sampling
Journal of Agricultural, Biological and Environmental Statistics - Trang 1-27 - 2023
Tóm tắt
Geostatistics is concerned with the estimation and prediction of spatially continuous phenomena using data obtained at a discrete set of locations. In geostatistics, preferential sampling occurs when these locations are not independent of the latent spatial field, and common modeling approaches that do not account for such a dependence structure might yield wrong inferences. To overcome this issue, some methods have been proposed to model data collected under preferential sampling. However, while these methods assume a constant degree of preferentiality, real data may present a degree of preferentiality that varies over space. For that reason, we propose a new model that accounts for preferential sampling by including a spatially varying coefficient that describes the dependence strength between the process that models the sampling locations and the latent field. To do so, we approximate the preferentiality component by a set of basis functions with the corresponding coefficients being estimated using the integrated nested Laplace approximation (INLA) method. By doing that, we allow the degree of preferentiality to vary over the domain with low computational burden. We assess our model performance by means of a simulation study and use it to analyze the average
$$\text {PM}_{2.5}$$
levels in the USA in 2022. We conclude that, given enough observed events, our model, along with the implemented inference routine, retrieves well the latent field itself and the spatially varying preferentiality surface, even under misspecified scenarios. Also, we offer guidelines for the specification and size of the set of basis functions. Supplementary materials accompanying this paper appear online.
Tài liệu tham khảo
Bolin D, Wallin J (2023) Local scale invariance and robustness of proper scoring rules. Stat Sci 38:140–159
Chen W, Li Y, Reich BJ, Sun Y(2020) Deepkriging: spatially dependent deep neural networks for spatial prediction, arXiv preprint arXiv:2007.11972
Dawid AP, Musio M (2014) Theory and applications of proper scoring rules. Metron 72:169–183
Diggle PJ, Menezes R, Su T (2010) Geostatistical inference under preferential sampling. J Roy Stat Soc: Ser C (Appl Stat) 59:191–232
Diggle PJ, Ribeiro PJ (2007) Model-based geostatistics. Springer, Berlin
Dinsdale D, Salibian-Barrera M (2018) Methods for preferential sampling in geostatistics. J Roy Stat Soc: Ser C (Appl Stat) 68:181–198
Ferreira GS (2020) Geostatistics under preferential sampling in the presence of local repulsion effects. Environ Ecol Stat 27:549–570
Ferreira GS, Gamerman D (2015) Optimal design in geostatistics under preferential sampling. Bayesian Anal 10:711–735
Gelfand AE, Sahu SK, Holland DM (2012) On the effect of preferential sampling in spatial prediction. Environmetrics 23:565–578
Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102:359–378
Gómez-Rubio V (2020) Bayesian inference with INLA. CRC Press, Boca Raton
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, vol 2. Springer, Berlin
Herman PMJ (2022) Presence/absence data of macrozoobenthos in the European Seas, https://github.com/EMODnet/EMODnet-Biology-Benthos-European-Seas
James G, Witten D, Hastie T, Tibshirani R et al (2013) An introduction to statistical learning, vol 112. Springer, Berlin
Lindgren F, Rue H (2015) Bayesian Spatial Modelling with R-INLA. J Stat Softw 63:1–25
Lindgren F, Rue H, Lindström H (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J R Stat Soc Ser B (Stat Methodol) 73:423–498
Matheron G (1963) Principles of geostatistics. Econ Geol 58:1246–1266
Matheson JE, Winkler RL (1976) Scoring rules for continuous probability distributions. Manage Sci 22:1087–1096
Moraga P (2021) Species distribution modeling using spatial point processes: a case study of sloth occurrence in Costa Rica. R J 12:293–310
Moraga P, Cramb SM, Mengersen KL, Pagano M (2017) A geostatistical model for combined analysis of point-level and area-level data using INLA and SPDE. Spat Stat 21:27–41
Moreira GA, Gamerman D (2022) Analysis of presence-only data via exact Bayes, with model and effects identification. Ann Appl Stat 16:1848–1867
Nychka D, Bandyopadhyay S, Hammerling D, Lindgren F, Sain S (2015) A multiresolution Gaussian process model for the analysis of large spatial datasets. J Comput Graph Stat 24:579–599
Pati D, Reich BJ, Dunson DB (2011) Bayesian geostatistical modelling with informative sampling locations. Biometrika 98:35–48
Pennino MG, Paradinas I, Illian JB, Muñoz F, Bellido JM, López-Quílez A, Conesa D (2019) Accounting for preferential sampling in species distribution models. Ecol Evol 9:653–663
R Development Core Team (2022) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Vienna, Austria. ISBN 3-900051-07-0
Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Ser B (Stat Methodol) 71:319–392
Shaddick G, Zidek JV (2014) A case study in preferential sampling: Long term monitoring of air pollution in the UK. Spat Stat 9:51–65
Shepard D (1968) A two-dimensional interpolation function for irregularly-spaced data, In: Proceedings of the 1968 23rd ACM national conference pp. 517–524
Simpson D, Illian JB, Lindgren F, Sørbye SH, Rue H (2016) Going off grid: computationally efficient inference for log-Gaussian Cox processes. Biometrika 103:49–70
Simpson D, Rue H, Riebler A, Martins TG, Sørbye SH (2017) Penalising model component complexity: a principled, practical approach to constructing priors. Stat Sci 32:1–28
United States Environmental Protection Agency (2022) https://aqs.epa.gov/aqsweb/airdata/download_files.html
Watson J (2021) A perceptron for detecting the preferential sampling of locations and times chosen to monitor a spatio-temporal process. Spat Stat 43:100500
Wendland H (1995) Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Adv Comput Math 4:389–396
Whittle P (1963) Stochastic-processes in several dimensions. Bull Int Stat Inst 40:974–994