Ecography

  1600-0587

  0906-7590

  Anh Quốc

Cơ quản chủ quản:  WILEY , Wiley-Blackwell Publishing Ltd

Lĩnh vực:
Ecology, Evolution, Behavior and Systematics

Các bài báo tiêu biểu

Novel methods improve prediction of species’ distributions from occurrence data
Tập 29 Số 2 - Trang 129-151 - 2006
Jane Elith, Catherine H. Graham, Robert P. Anderson, Miroslav Dudı́k, Simon Ferrier, Antoine Guisan, Robert J. Hijmans, Falk Huettmann, John R. Leathwick, Anthony Lehmann, Jin Li, Lúcia G. Lohmann, Bette A. Loiselle, Glenn Manion, Craig Moritz, Miguel Nakamura, Yoshinori Nakazawa, Jacob McC. Overton, A. Townsend Peterson, Steven J. Phillips, Karen Richardson, Ricardo Scachetti‐Pereira, Robert E. Schapire, Jorge Soberón, Stephen E. Williams, Mary S. Wisz, Niklaus E. Zimmermann
Prediction of species’ distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence‐only data to fit models, and independent presence‐absence data to evaluate the predictions. Along with well‐established modelling methods such as generalised additive models and GARP and BIOCLIM, we explored methods that either have been developed recently or have rarely been applied to modelling species’ distributions. These include machine‐learning methods and community models, both of which have features that may make them particularly well suited to noisy or sparse information, as is typical of species’ occurrence data. Presence‐only data were effective for modelling species’ distributions for many species and regions. The novel methods consistently outperformed more established methods. The results of our analysis are promising for the use of data from museums and herbaria, especially as methods suited to the noise inherent in such data improve.
Collinearity: a review of methods to deal with it and a simulation study evaluating their performance
Tập 36 Số 1 - Trang 27-46 - 2013
Carsten F. Dormann, Jane Elith, Sven Bacher, Carsten M. Buchmann, Gudrun Carl, Gabriel Carré, Jaime Márquez, Bernd Gruber, Bruno Lafourcade, Pedro J. Leitão, Tamara Münkemüller, Colin J. McClean, Patrick E. Osborne, Björn Reineking, Boris Schröder, Andrew K. Skidmore, Damaris Zurell, Sven Lautenbach
Collinearity refers to the non independence of predictor variables, usually in a regression‐type analysis. It is a common feature of any descriptive ecological data set and can be a problem for parameter estimation because it inflates the variance of regression parameters and hence potentially leads to the wrong identification of relevant predictors in a statistical model. Collinearity is a severe problem when a model is trained on data from one region or time, and predicted to another with a different or unknown structure of collinearity. To demonstrate the reach of the problem of collinearity in ecology, we show how relationships among predictors differ between biomes, change over spatial scales and through time. Across disciplines, different approaches to addressing collinearity problems have been developed, ranging from clustering of predictors, threshold‐based pre‐selection, through latent variable methods, to shrinkage and regularisation. Using simulated data with five predictor‐response relationships of increasing complexity and eight levels of collinearity we compared ways to address collinearity with standard multiple regression and machine‐learning approaches. We assessed the performance of each approach by testing its impact on prediction to new data. In the extreme, we tested whether the methods were able to identify the true underlying relationship in a training dataset with strong collinearity by evaluating its performance on a test dataset without any collinearity. We found that methods specifically designed for collinearity, such as latent variable methods and tree based models, did not outperform the traditional GLM and threshold‐based pre‐selection. Our results highlight the value of GLM in combination with penalised methods (particularly ridge) and threshold‐based pre‐selection when omitted variables are considered in the final interpretation. However, all approaches tested yielded degraded predictions under change in collinearity structure and the ‘folk lore’‐thresholds of correlation coefficients between predictor variables of |r| >0.7 was an appropriate indicator for when collinearity begins to severely distort model estimation and subsequent prediction. The use of ecological understanding of the system in pre‐analysis variable selection and the choice of the least sensitive statistical approaches reduce the problems of collinearity, but cannot ultimately solve them.
Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation
Tập 31 Số 2 - Trang 161-175 - 2008
Steven J. Phillips, Miroslav Dudík
Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively time‐consuming to do separately for each species, or unreliable for small or biased datasets. Additionally, even with the abundance of good quality data, users interested in the application of species models need not have the statistical knowledge required for detailed tuning. In such cases, it is desirable to use “default settings”, tuned and validated on diverse datasets. Maxent is a recently introduced modeling technique, achieving high predictive accuracy and enjoying several additional attractive properties. The performance of Maxent is influenced by a moderate number of parameters. The first contribution of this paper is the empirical tuning of these parameters. Since many datasets lack information about species absence, we present a tuning method that uses presence‐only data. We evaluate our method on independently collected high‐quality presence‐absence data. In addition to tuning, we introduce several concepts that improve the predictive accuracy and running time of Maxent. We introduce “hinge features” that model more complex relationships in the training data; we describe a new logistic output format that gives an estimate of probability of presence; finally we explore “background sampling” strategies that cope with sample selection bias and decrease model‐building time. Our evaluation, based on a diverse dataset of 226 species from 6 regions, shows: 1) default settings tuned on presence‐only data achieve performance which is almost as good as if they had been tuned on the evaluation data itself; 2) hinge features substantially improve model performance; 3) logistic output improves model calibration, so that large differences in output values correspond better to large differences in suitability; 4) “target‐group” background sampling can give much better predictive performance than random background sampling; 5) random background sampling results in a dramatic decrease in running time, with no decrease in model performance.
Methods to account for spatial autocorrelation in the analysis of species distributional data: a review
Tập 30 Số 5 - Trang 609-628 - 2007
Carsten F. Dormann, Jana McPherson, Miguel B. Araújo, Roger Bivand, Janine Bolliger, Gudrun Carl, R. Davies, Alexandre H. Hirzel, Walter Jetz, W. Daniel Kissling, Ingolf Kühn, Ralf Ohlemüller, Pedro R. Peres‐Neto, Björn Reineking, Boris Schröder, Frank M. Schurr, Robert J. Wilson
Species distributional or trait data based on range map (extent‐of‐occurrence) or atlas survey data often display spatial autocorrelation, i.e. locations close to each other exhibit more similar values than those further apart. If this pattern remains present in the residuals of a statistical model based on such data, one of the key assumptions of standard statistical analyses, that residuals are independent and identically distributed (i.i.d), is violated. The violation of the assumption of i.i.d. residuals may bias parameter estimates and can increase type I error rates (falsely rejecting the null hypothesis of no effect). While this is increasingly recognised by researchers analysing species distribution data, there is, to our knowledge, no comprehensive overview of the many available spatial statistical methods to take spatial autocorrelation into account in tests of statistical significance. Here, we describe six different statistical approaches to infer correlates of species’ distributions, for both presence/absence (binary response) and species abundance data (poisson or normally distributed response), while accounting for spatial autocorrelation in model residuals: autocovariate regression; spatial eigenvector mapping; generalised least squares; (conditional and simultaneous) autoregressive models and generalised estimating equations. A comprehensive comparison of the relative merits of these methods is beyond the scope of this paper. To demonstrate each method's implementation, however, we undertook preliminary tests based on simulated data. These preliminary tests verified that most of the spatial modeling techniques we examined showed good type I error control and precise parameter estimates, at least when confronted with simplistic simulated data containing spatial autocorrelation in the errors. However, we found that for presence/absence data the results and conclusions were very variable between the different methods. This is likely due to the low information content of binary maps. Also, in contrast with previous studies, we found that autocovariate methods consistently underestimated the effects of environmental controls of species distributions. Given their widespread use, in particular for the modelling of species presence/absence data (e.g. climate envelope models), we argue that this warrants further study and caution in their use. To aid other ecologists in making use of the methods described, code to implement them in freely available software is provided in an electronic appendix.
A practical guide to MaxEnt for modeling species' distributions: what it does, and why inputs and settings matter
Tập 36 Số 10 - Trang 1058-1069 - 2013
Cory Merow, Matthew J. Smith, John A. Silander
The MaxEnt software package is one of the most popular tools for species distribution and environmental niche modeling, with over 1000 published applications since 2006. Its popularity is likely for two reasons: 1) MaxEnt typically outperforms other methods based on predictive accuracy and 2) the software is particularly easy to use. MaxEnt users must make a number of decisions about how they should select their input data and choose from a wide variety of settings in the software package to build models from these data. The underlying basis for making these decisions is unclear in many studies, and default settings are apparently chosen, even though alternative settings are often more appropriate. In this paper, we provide a detailed explanation of how MaxEnt works and a prospectus on modeling options to enable users to make informed decisions when preparing data, choosing settings and interpreting output. We explain how the choice of background samples reflects prior assumptions, how nonlinear functions of environmental variables (features) are created and selected, how to account for environmentally biased sampling, the interpretation of the various types of model output and the challenges for model evaluation. We demonstrate MaxEnt's calculations using both simplified simulated data and occurrence data from South Africa on species of the flowering plant family Proteaceae. Throughout, we show how MaxEnt's outputs vary in response to different settings to highlight the need for making biologically motivated modeling decisions.
Opening the black box: an open‐source release of Maxent
Tập 40 Số 7 - Trang 887-893 - 2017
Steven J. Phillips, Robert P. Anderson, Miroslav Dudík, Robert E. Schapire, Mary E. Blair
This software note announces a new open‐source release of the Maxent software for modeling species distributions from occurrence records and environmental data, and describes a new R package for fitting such models. The new release (ver. 3.4.0) will be hosted online by the American Museum of Natural History, along with future versions. It contains small functional changes, most notably use of a complementary log‐log (cloglog) transform to produce an estimate of occurrence probability. The cloglog transform derives from the recently‐published interpretation of Maxent as an inhomogeneous Poisson process (IPP), giving it a stronger theoretical justification than the logistic transform which it replaces by default. In addition, the new R package, maxnet, fits Maxent models using the glmnet package for regularized generalized linear models. We discuss the implications of the IPP formulation in terms of model inputs and outputs, treating occurrence records as points rather than grid cells and interpreting the exponential Maxent model (raw output) as as an estimate of relative abundance. With these two open‐source developments, we invite others to freely use and contribute to the software.
spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models
Tập 38 Số 5 - Trang 541-545 - 2015
Matthew Aiello‐Lammens, Robert A. Boria, Aleksandar Radosavljević, Bruno Vilela, Robert P. Anderson
Spatial thinning of species occurrence records can help address problems associated with spatial sampling biases. Ideally, thinning removes the fewest records necessary to substantially reduce the effects of sampling bias, while simultaneously retaining the greatest amount of useful information. Spatial thinning can be done manually; however, this is prohibitively time consuming for large datasets. Using a randomization approach, the ‘thin’ function in the spThin R package returns a dataset with the maximum number of records for a given thinning distance, when run for sufficient iterations. We here provide a worked example for the Caribbean spiny pocket mouse, where the results obtained match those of manual thinning.
Where is positional uncertainty a problem for species distribution modelling?
Tập 37 Số 2 - Trang 191-203 - 2014
Babak Naimi, Nicholas Hamm, T.A. Groen, Andrew K. Skidmore, Albertus G. Toxopeus
Species data held in museum and herbaria, survey data and opportunistically observed data are a substantial information resource. A key challenge in using these data is the uncertainty about where an observation is located. This is important when the data are used for species distribution modelling (SDM), because the coordinates are used to extract the environmental variables and thus, positional error may lead to inaccurate estimation of the species–environment relationship. The magnitude of this effect is related to the level of spatial autocorrelation in the environmental variables. Using local spatial association can be relevant because it can lead to the identification of the specific occurrence records that cause the largest drop in SDM accuracy. Therefore, in this study, we tested whether the SDM predictions are more affected by positional uncertainty originating from locations that have lower local spatial association in their predictors. We performed this experiment for Spain and the Netherlands, using simulated datasets derived from well known species distribution models (SDMs). We used the K statistic to quantify the local spatial association in the predictors at each species occurrence location. A probabilistic approach using Monte Carlo simulations was employed to introduce the error in the species locations. The results revealed that positional uncertainty in species occurrence data at locations with low local spatial association in predictors reduced the prediction accuracy of the SDMs. We propose that local spatial association is a way to identify the species occurrence records that require treatment for positional uncertainty. We also developed and present a tool in the R environment to target observations that are likely to create error in the output from SDMs as a result of positional uncertainty.
ecospat: an R package to support spatial analyses and modeling of species niches and distributions
Tập 40 Số 6 - Trang 774-787 - 2017
Valeria Di Cola, Olivier Broennimann, Blaise Petitpierre, Frank T. Breiner, Manuela D’Amen, Christophe F. Randin, Robin Engler, Antoine Guisan, Dorothea Pio, Luigi Maiorano, Loïc Pellissier, Rubén G. Mateo, Wim Hordijk, Nicolas Salamin
The aim of the ecospat package is to make available novel tools and methods to support spatial analyses and modeling of species niches and distributions in a coherent workflow. The package is written in the R language (R Development Core Team) and contains several features, unique in their implementation, that are complementary to other existing R packages. Pre‐modeling analyses include species niche quantifications and comparisons between distinct ranges or time periods, measures of phylogenetic diversity, and other data exploration functionalities (e.g. extrapolation detection, ExDet). Core modeling brings together the new approach of ensemble of small models (ESM) and various implementations of the spatially‐explicit modeling of species assemblages (SESAM) framework. Post‐modeling analyses include evaluation of species predictions based on presence‐only data (Boyce index) and of community predictions, phylogenetic diversity and environmentally‐constrained species co‐occurrences analyses. The ecospat package also provides some functions to supplement the ‘biomod2’ package (e.g. data preparation, permutation tests and cross‐validation of model predictive power). With this novel package, we intend to stimulate the use of comprehensive approaches in spatial modelling of species and community distributions.
Transferability and model evaluation in ecological niche modeling: a comparison of GARP and Maxent
Tập 30 Số 4 - Trang 550-560 - 2007
A. Townsend Peterson, Monica Papeş, Muir D. Eaton
We compared predictive success in two common algorithms for modeling species’ ecological niches, GARP and Maxent, in a situation that challenged the algorithms to be general – that is, to be able to predict the species’ distributions in broad unsampled regions, here termed transferability. The results were strikingly different between the two algorithms – Maxent models reconstructed the overall distributions of the species at low thresholds, but higher predictive levels of Maxent predictions reflected overfitting to the input data; GARP models, on the other hand, succeeded in anticipating most of the species’ distributional potential, at the cost of increased (apparent, at least) commission error. Receiver operating characteristic (ROC) tests were weak in discerning models able to predict into broad unsampled areas from those that were not. Such transferability is clearly a novel challenge for modeling algorithms, and requires different qualities than does predicting within densely sampled landscapes – in this case, Maxent was transferable only at very low thresholds, and biases and gaps in input data may frequently affect results based on higher Maxent thresholds, requiring careful interpretation of model results.