Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation

Ecography - Tập 31 Số 2 - Trang 161-175 - 2008
Steven J. Phillips1, Miroslav Dudík
1At&T#TAB#

Tóm tắt

Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively time‐consuming to do separately for each species, or unreliable for small or biased datasets. Additionally, even with the abundance of good quality data, users interested in the application of species models need not have the statistical knowledge required for detailed tuning. In such cases, it is desirable to use “default settings”, tuned and validated on diverse datasets. Maxent is a recently introduced modeling technique, achieving high predictive accuracy and enjoying several additional attractive properties. The performance of Maxent is influenced by a moderate number of parameters. The first contribution of this paper is the empirical tuning of these parameters. Since many datasets lack information about species absence, we present a tuning method that uses presence‐only data. We evaluate our method on independently collected high‐quality presence‐absence data. In addition to tuning, we introduce several concepts that improve the predictive accuracy and running time of Maxent. We introduce “hinge features” that model more complex relationships in the training data; we describe a new logistic output format that gives an estimate of probability of presence; finally we explore “background sampling” strategies that cope with sample selection bias and decrease model‐building time. Our evaluation, based on a diverse dataset of 226 species from 6 regions, shows: 1) default settings tuned on presence‐only data achieve performance which is almost as good as if they had been tuned on the evaluation data itself; 2) hinge features substantially improve model performance; 3) logistic output improves model calibration, so that large differences in output values correspond better to large differences in suitability; 4) “target‐group” background sampling can give much better predictive performance than random background sampling; 5) random background sampling results in a dramatic decrease in running time, with no decrease in model performance.

Từ khóa


Tài liệu tham khảo

10.1046/j.1365-2699.2003.00867.x

10.1111/j.1365-2486.2005.01000.x

Cover T. M., 2006, Elements of information theory

Dudík M., 2004, Proceedings of the Seventeenth Annual Conference on Computational Learning Theory, 655

Dudík M., 2005, Advances in Neural Information Processing Systems 18, 323

Elith J., 2002, Quantitative methods for conservation biology, 39

10.1111/j.1472-4642.2007.00340.x

10.1111/j.2006.0906-7590.04596.x

10.1017/S0376892997000088

10.1214/aos/1176347963

10.1214/aos/1013203451

10.1016/j.tree.2004.07.006

10.1073/pnas.0505754103

Grünwald P., 2000, Proceedings of the Sixteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI2000), 238

10.1214/009053604000000553

Hastie T., 1990, Generalized additive models

10.1007/978-0-387-21606-5

10.1103/PhysRev.106.620

10.2193/0022-541X(2004)068[0774:UAIOLR]2.0.CO;2

Ng A. Y., 2001, On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes, Adv. Neural Inform. Process. Syst., 14, 605

10.1111/j.1365-2699.2006.01460.x

10.17161/bi.v3i0.29

10.1126/science.285.5431.1265

10.1614/P2002-081

Phillips S. J., 2004, Proceedings of the Twenty‐First International Conference on Machine Learning, 472

Phillips S. J., 2005, Maxent software for species distribution modeling

10.1016/j.ecolmodel.2005.03.026

10.1046/j.1523-1739.2001.015003648.x

10.1111/j.1365-2699.2006.01466.x

10.1046/j.1365-2699.2003.00946.x

10.1038/nature02121

10.1111/j.1365-2486.2005.001018.x

Tibshirani R., 1996, Bias, variance and prediction error for classification rules

Topsøe F., 1979, Information theoretical optimization techniques, Kybernetika, 15, 8

Ward G., 2007, Presence‐only data and the em algorithm

10.1162/neco.1995.7.1.117

10.1016/S0304-3800(02)00199-0

10.1002/1097-0258(20000715)19:13<1771::AID-SIM485>3.0.CO;2-P