Minimum required number of specimen records to develop accurate species distribution models

Ecography - Tập 39 Số 6 - Trang 542-552 - 2016
André S. J. van Proosdij1,2, Marc S.M. Sosef1,3, Jan J. Wieringa1,2, Niels Raes2
1Biosystematics Group, Wageningen Univ., Droevendaalsesteeg 1, NL-6708 PB Wageningen, the Netherlands
2Naturalis Biodiversity Center (Botany section), Darwinweg 2, NL-2333 CR Leiden, the Netherlands
3Botanic Garden Meise, Nieuwelaan 38, BE-1860 Meise, Belgium

Tóm tắt

Species distribution models (SDMs) are widely used to predict the occurrence of species. Because SDMs generally use presence‐only data, validation of the predicted distribution and assessing model accuracy is challenging. Model performance depends on both sample size and species’ prevalence, being the fraction of the study area occupied by the species. Here, we present a novel method using simulated species to identify the minimum number of records required to generate accurate SDMs for taxa of different pre‐defined prevalence classes. We quantified model performance as a function of sample size and prevalence and found model performance to increase with increasing sample size under constant prevalence, and to decrease with increasing prevalence under constant sample size. The area under the curve (AUC) is commonly used as a measure of model performance. However, when applied to presence‐only data it is prevalence‐dependent and hence not an accurate performance index. Testing the AUC of an SDM for significant deviation from random performance provides a good alternative. We assessed the minimum number of records required to obtain good model performance for species of different prevalence classes in a virtual study area and in a real African study area. The lower limit depends on the species’ prevalence with absolute minimum sample sizes as low as 3 for narrow‐ranged and 13 for widespread species for our virtual study area which represents an ideal, balanced, orthogonal world. The lower limit of 3, however, is flawed by statistical artefacts related to modelling species with a prevalence below 0.1. In our African study area lower limits are higher, ranging from 14 for narrow‐ranged to 25 for widespread species. We advocate identifying the minimum sample size for any species distribution modelling by applying the novel method presented here, which is applicable to any taxonomic clade or group, study area or climate scenario.

Từ khóa


Tài liệu tham khảo

10.1371/journal.pone.0063708

10.1111/ddi.12268

10.1111/j.1600-0587.2009.05832.x

10.1890/11-1930.1

10.1111/j.1365-2486.2005.01000.x

10.1016/j.ecolmodel.2006.05.023

10.1016/j.ecolmodel.2011.02.011

10.1111/j.1600-0587.2011.06545.x

10.1111/j.1466-8238.2012.00764.x

10.1111/oik.01277

10.1111/j.1466-8238.2011.00698.x

10.1126/science.1230318

10.1111/j.1600-0587.2012.07348.x

10.1111/ecog.01080

10.1111/j.2006.0906-7590.04596.x

FAO/IIASA/ISRIC/ISSCAS/JRC2012.Harmonized World Soil Database (version 1.2). – <http://webarchive.iiasa.ac.at/Research/LUC/External‐World‐soil‐database/HTML/index.html?sb=1>.

10.1111/j.1472-4642.2011.00813.x

10.1017/S0376892997000088

Gentz A. et al.2014.mvtnorm: multivariate normal and t distributions. – R package ver. 0.9‐9997 <http://CRAN.R‐project.org/package=mvtnorm>.

10.1111/j.1365-2664.2007.01408.x

10.1016/S0304-3800(00)00354-9

Heibl C. andCalenge C.2013.phyloclim: integrating phylogenetics and climatic niche modeling. – R package ver. 0.9‐4 <http://CRAN.R‐project.org/package=phyloclim>.

10.1111/j.0906-7590.2006.04700.x

10.1002/joc.1276

Hijmans R. J. et al.2013.dismo: species distribution modeling. – R package ver. 0.8‐17 <http://CRAN.R‐project.org/package=dismo>.

10.1016/S0304-3800(01)00396-9

IUCN2001.IUCN Red List categories and criteria version 3.1. –IUCN Species Survival Commision Cambridge UK.

10.1111/j.1466-8238.2011.00683.x

10.1556/ComEc.10.2009.2.9

10.1007/s10530-011-9963-4

10.1890/02-5364

10.1111/j.1600-0587.2010.06354.x

10.1016/j.jnc.2010.03.002

10.1111/j.1466-8238.2007.00358.x

Loiselle B. A., 2008, Predicting species distributions from herbarium collections: does climate bias in collection sampling influence model outcomes? – J, Biogeogr, 35, 105, 10.1111/j.1365-2699.2007.01779.x

Lomolino M. V, 2010, Biogeography

10.1890/0012-9658(2002)083[0689:TAFOAT]2.0.CO;2

10.1046/j.1365-2664.2001.00647.x

10.1111/j.1654-1103.2010.01198.x

McPherson J. M., 2007, Effects of species’ ecology on the accuracy of distribution models, Ecography, 30, 135

10.1111/j.0021-8901.2004.00943.x

10.1016/j.ecolmodel.2010.11.016

10.1111/j.1600-0587.2013.07872.x

Metz C. E, 1978, Basic principles of ROC analysis, Sem. Nucl. Med, 283, 10.1016/S0001-2998(78)80014-2

10.1111/j.1365-2699.2007.01720.x

10.1177/0309133314521448

10.1371/journal.pbio.1001127

10.1080/13658816.2012.721553

10.1111/j.1466-8238.2009.00476.x

10.1111/j.1472-4642.2007.00392.x

10.1016/S0304-3800(00)00322-7

10.1111/j.1365-2699.2006.01594.x

10.17161/bi.v3i0.29

10.1016/j.ecolmodel.2005.03.026

10.1890/07-2153.1

Rabinowitz D, 1981, The biological aspects of rare plants conservation, 205

10.4322/natcon.2012.020

10.1111/j.2007.0906-7590.05041.x

10.1111/j.1600-0587.2009.05800.x

10.1016/j.sajb.2013.06.004

10.1046/j.1365-2699.2003.00946.x

10.1111/j.1466-8238.2011.00659.x

10.1111/j.1472-4642.2010.00716.x

10.1016/j.ecolmodel.2012.04.001

10.1111/j.1366-9516.2005.00185.x

10.2307/1935376

10.1111/ddi.12031

10.1016/S0304-3800(01)00388-X

10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-E

10.1016/j.biocon.2014.06.012

10.3732/ajb.1000215

10.1126/science.1243092

10.1111/ddi.12236

van Proosdij A. S. J. et al.2015.Data from: Minimum required number of specimen records to develop accurate species distribution models. – Dryad Digital Repository <http://dx.doi.org/10.5061/dryad.8sb8v>.

10.1111/j.1600-0587.2013.00441.x

10.1111/j.1558-5646.2008.00482.x

10.1111/j.1366-9516.2005.00143.x

10.1111/j.1472-4642.2008.00482.x

10.1111/j.1600-0706.2009.18284.x