Model-Based Geostatistics Under Spatially Varying Preferential Sampling

André Victor Ribeiro Amaral1, Elias Teixeira Krainski1, Ruiman Zhong1, Paula Moraga1
1CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

Tóm tắt

Geostatistics is concerned with the estimation and prediction of spatially continuous phenomena using data obtained at a discrete set of locations. In geostatistics, preferential sampling occurs when these locations are not independent of the latent spatial field, and common modeling approaches that do not account for such a dependence structure might yield wrong inferences. To overcome this issue, some methods have been proposed to model data collected under preferential sampling. However, while these methods assume a constant degree of preferentiality, real data may present a degree of preferentiality that varies over space. For that reason, we propose a new model that accounts for preferential sampling by including a spatially varying coefficient that describes the dependence strength between the process that models the sampling locations and the latent field. To do so, we approximate the preferentiality component by a set of basis functions with the corresponding coefficients being estimated using the integrated nested Laplace approximation (INLA) method. By doing that, we allow the degree of preferentiality to vary over the domain with low computational burden. We assess our model performance by means of a simulation study and use it to analyze the average $$\text {PM}_{2.5}$$ levels in the USA in 2022. We conclude that, given enough observed events, our model, along with the implemented inference routine, retrieves well the latent field itself and the spatially varying preferentiality surface, even under misspecified scenarios. Also, we offer guidelines for the specification and size of the set of basis functions. Supplementary materials accompanying this paper appear online.

Tài liệu tham khảo

Bolin D, Wallin J (2023) Local scale invariance and robustness of proper scoring rules. Stat Sci 38:140–159 Chen W, Li Y, Reich BJ, Sun Y(2020) Deepkriging: spatially dependent deep neural networks for spatial prediction, arXiv preprint arXiv:2007.11972 Dawid AP, Musio M (2014) Theory and applications of proper scoring rules. Metron 72:169–183 Diggle PJ, Menezes R, Su T (2010) Geostatistical inference under preferential sampling. J Roy Stat Soc: Ser C (Appl Stat) 59:191–232 Diggle PJ, Ribeiro PJ (2007) Model-based geostatistics. Springer, Berlin Dinsdale D, Salibian-Barrera M (2018) Methods for preferential sampling in geostatistics. J Roy Stat Soc: Ser C (Appl Stat) 68:181–198 Ferreira GS (2020) Geostatistics under preferential sampling in the presence of local repulsion effects. Environ Ecol Stat 27:549–570 Ferreira GS, Gamerman D (2015) Optimal design in geostatistics under preferential sampling. Bayesian Anal 10:711–735 Gelfand AE, Sahu SK, Holland DM (2012) On the effect of preferential sampling in spatial prediction. Environmetrics 23:565–578 Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102:359–378 Gómez-Rubio V (2020) Bayesian inference with INLA. CRC Press, Boca Raton Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, vol 2. Springer, Berlin Herman PMJ (2022) Presence/absence data of macrozoobenthos in the European Seas, https://github.com/EMODnet/EMODnet-Biology-Benthos-European-Seas James G, Witten D, Hastie T, Tibshirani R et al (2013) An introduction to statistical learning, vol 112. Springer, Berlin Lindgren F, Rue H (2015) Bayesian Spatial Modelling with R-INLA. J Stat Softw 63:1–25 Lindgren F, Rue H, Lindström H (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J R Stat Soc Ser B (Stat Methodol) 73:423–498 Matheron G (1963) Principles of geostatistics. Econ Geol 58:1246–1266 Matheson JE, Winkler RL (1976) Scoring rules for continuous probability distributions. Manage Sci 22:1087–1096 Moraga P (2021) Species distribution modeling using spatial point processes: a case study of sloth occurrence in Costa Rica. R J 12:293–310 Moraga P, Cramb SM, Mengersen KL, Pagano M (2017) A geostatistical model for combined analysis of point-level and area-level data using INLA and SPDE. Spat Stat 21:27–41 Moreira GA, Gamerman D (2022) Analysis of presence-only data via exact Bayes, with model and effects identification. Ann Appl Stat 16:1848–1867 Nychka D, Bandyopadhyay S, Hammerling D, Lindgren F, Sain S (2015) A multiresolution Gaussian process model for the analysis of large spatial datasets. J Comput Graph Stat 24:579–599 Pati D, Reich BJ, Dunson DB (2011) Bayesian geostatistical modelling with informative sampling locations. Biometrika 98:35–48 Pennino MG, Paradinas I, Illian JB, Muñoz F, Bellido JM, López-Quílez A, Conesa D (2019) Accounting for preferential sampling in species distribution models. Ecol Evol 9:653–663 R Development Core Team (2022) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Vienna, Austria. ISBN 3-900051-07-0 Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Ser B (Stat Methodol) 71:319–392 Shaddick G, Zidek JV (2014) A case study in preferential sampling: Long term monitoring of air pollution in the UK. Spat Stat 9:51–65 Shepard D (1968) A two-dimensional interpolation function for irregularly-spaced data, In: Proceedings of the 1968 23rd ACM national conference pp. 517–524 Simpson D, Illian JB, Lindgren F, Sørbye SH, Rue H (2016) Going off grid: computationally efficient inference for log-Gaussian Cox processes. Biometrika 103:49–70 Simpson D, Rue H, Riebler A, Martins TG, Sørbye SH (2017) Penalising model component complexity: a principled, practical approach to constructing priors. Stat Sci 32:1–28 United States Environmental Protection Agency (2022) https://aqs.epa.gov/aqsweb/airdata/download_files.html Watson J (2021) A perceptron for detecting the preferential sampling of locations and times chosen to monitor a spatio-temporal process. Spat Stat 43:100500 Wendland H (1995) Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Adv Comput Math 4:389–396 Whittle P (1963) Stochastic-processes in several dimensions. Bull Int Stat Inst 40:974–994