On the selection of thresholds for predicting species occurrence with presence‐only data

Ecology and Evolution - Tập 6 Số 1 - Trang 337-348 - 2016
Canran Liu1, Graeme Newell1, Matt White1
1Arthur Rylah Institute for Environmental Research, Department of Environment, Land, Water and Planning, Heidelberg, Victoria 3084, Australia

Tóm tắt

Abstract

Presence‐only data present challenges for selecting thresholds to transform species distribution modeling results into binary outputs. In this article, we compare two recently published threshold selection methods (maxSSS and maxFpb) and examine the effectiveness of the threshold‐based prevalence estimation approach. Six virtual species with varying prevalence were simulated within a real landscape in southeastern Australia. Presence‐only models were built with DOMAIN, generalized linear model, Maxent, and Random Forest. Thresholds were selected with two methods maxSSS and maxFpb with four presence‐only datasets with different ratios of the number of known presences to the number of random points (KPRPratio). Sensitivity, specificity, true skill statistic, and F measure were used to evaluate the performance of the results. Species prevalence was estimated as the ratio of the number of predicted presences to the total number of points in the evaluation dataset. Thresholds selected with maxFpb varied as the KPRPratio of the threshold selection datasets changed. Datasets with the KPRPratio around 1 generally produced better results than scores distant from 1. Results produced by We conclude that maxFpb had specificity too low for very common species using Random Forest and Maxent models. In contrast, maxSSS produced consistent results whichever dataset was used. The estimation of prevalence was almost always biased, and the bias was very large for DOMAIN and Random Forest predictions. We conclude that maxFpb is affected by the KPRPratio of the threshold selection datasets, but maxSSS is almost unaffected by this ratio. Unbiased estimations of prevalence are difficult to be determined using the threshold‐based approach.

Từ khóa


Tài liệu tham khảo

10.1111/j.2041-210X.2011.00141.x

10.1371/journal.pone.0096261

10.1111/j.1365-2699.2006.01584.x

10.1111/j.2041-210X.2011.00172.x

10.1111/j.1600-0587.2011.06545.x

10.1016/j.biocon.2015.06.011

10.1371/journal.pone.0088798

Busby J. R., 1991, BIOCLIM – a bioclimate analysis and prediction system, Plant Prot. Q., 6, 8

10.1111/1365-2664.12295

10.1007/BF00051966

10.1111/j.1467-9876.2011.00769.x

10.1890/07-0539.1

10.1111/j.1365-2664.2006.01141.x

10.1111/j.2006.0906-7590.04596.x

10.1017/S0376892997000088

10.1016/j.ecolmodel.2008.05.015

10.1111/geb.12268

10.1016/j.ecolmodel.2004.07.012

10.1111/j.1600-0587.2013.00321.x

10.1890/0012-9658(2002)083[2027:ENFAHT]2.0.CO;2

10.1016/j.actao.2007.02.001

10.1111/1365-2656.12141

10.1111/j.1600-0587.2013.07585.x

10.1007/s10531-007-9270-7

10.1111/j.0906-7590.2005.03957.x

10.1111/j.1600-0587.2010.06354.x

10.1111/jbi.12058

10.1016/j.ecolmodel.2012.07.003

10.1111/jbi.12006

10.1126/science.1259911

10.1016/j.ecolmodel.2011.07.011

Pachpatte B. G., 2005, Mathematical inequalities

10.1016/S0304-3800(02)00056-X

10.1111/j.0906-7590.2004.03740.x

10.1111/j.1365-2699.2006.01594.x

10.1111/j.0906-7590.2008.5203.x

10.1890/12-1520.1

10.1016/j.ecolmodel.2005.03.026

R Development Core Team, 2012, R: a language and environment for statistical computing

10.1111/geb.12333

10.1080/136588199241391

10.1016/j.biocon.2014.06.012

10.1111/j.1365-2699.2009.02174.x

10.1214/10-AOAS331