Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction

Anantha Prasad1, Louis R. Iverson1, Andy Liaw2
1Northeastern Research Station, USDA Forest Service, Delaware, USA
2Biometrics Research Department, Merck Research Laboratories, Rahway, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Abraham A, Steinberg D (2001) MARS: Still an alien planet in soft computing? In: Alexandrov VN, Dongarra JJ, Juliano BA, Renner RS, Tan CJK (eds) Lecture notes in computer science 2074. Springer, Berlin Heidelberg New York, p 235–244

Baker FA (1993) Classification and regression tree analysis for assessing hazard of pine mortality caused by Heterobasidion annosum. Plant Dis 77:136–9

Boer GJ, Flato GM, Ramsden D (2000) A transient climate change simulation with historical and projected greenhouse gas and aerosol forcing: projected climate for the 21st century. Clim Dyn 16:427–51

Breiman L (1996a) Bagging predictors. Mach Learn 24:123–40

Breiman L. 1996b. Out-of-bag estimation. Technical report, Department of Statistics: University of California, Berkeley

Breiman L (2001) Random forests. Mach Learn 45:5–32

Breiman L (2002) Using models to infer mechanisms. IMS Wald Lecture 2. [online] URL: http://www.oz.berkeley.edu/users/breiman/wald2002-2.pdf

Breiman L, Cutler A. 2003. Setting up, using, and understanding Random Forests v4.0. [online] URL: http://www.stat.berkeley.edu/users/breiman/rf.html

Breiman L, Freidman J, Olshen R, Stone C(1984) Classification and regression trees. Wadsworth, Belmont (CA), p 358

Buhlmann P, Yu B (2002) Analyzing bagging. Ann Stat 30:927–61

Chambers JM (1998) Programming with data: a guide to the S language. Springer, Berlin Heidelberg New York, p 469

Chambers JM, Hastie TJ (1993) Statistical models in New York; S. Chapman & Hall, 608 p

Chan JCW, Huang C, DeFries R (2001) Enhanced algorithm performance for land cover classification using bootstrap aggregating (bagging). IEEE Trans Geosci Remote Sens 39(3):693–5

Clark JS (1998) Why trees migrate so fast: confronting theory with dispersal biology and the paleorecord. Am Nat 152:204–24

Clark LA, Pregibon D (1992) Tree-based models. In: Chambers JM, Hastie TJ (eds) Statistical models S. Pacific Grove (CA): Wadsworth, p 377–419

Davis MB (1989) Lags in vegetation response to greenhouse warming. Clim Change 15:75–82

De’ath G, Fabricius KE (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178–192

Dobbertin M, Biging GS (1998) Using the non-parametric classifier CART to model forest tree mortality. For Sci 44(4):507–516

Environmental Systems Research Institute. 2001. Arc ver. 8.1.2. Environmental Systems Research Institute, Redlands (CA)

Franklin J (1995) Predictive vegetation mapping: geographic modeling of biospatial patterns in relation to environmental gradients. Prog Phys Geogr 19:494–519

Franklin J (1998) Predicting the distribution of shrub species in southern California from climate and terrain-derived variables. J Veg Sci 9:733–48

Freidman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–141

Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121:256–85

Furlanello C, Neteler M, Merler S, Menegon S, Fontanari S, Donini A, Rizzoli A, Chemini C. 2003. GIS and the Random Forests predictor: integration in R for tick-borne disease risk assessment. In: Hornik K, Leisch F, Zeileis A, Eds. Proceedings of the 3rd international workshop on distributed statistical computing. Vienna, Austria, p 1–11

Hagen A. 2002. Technical report: comparison of maps containing nominal data. RIVM project: MAP-SOR S/550002/01/RO, order no. 143699. Maastricht (The Netherlands): Research Institute for Knowledge Systems

Hagen A (2003) Fuzzy set approach to assessing similarity of categorical maps. Int J Geog Inf Sci 17(3):235–49

Hansen M, Dubayah R, Defries R (1996) Classification trees: an alternative to traditional land cover classifiers. Int J Remote Sens 17(5):1075–81

Hansen MH, Frieswyk T, Glover JF, Kelly JF (1992) The Eastwide forest inventory data base: users manual. General technical report NC-151. St. Paul (MM): US Department of Agriculture, Forest Service, North Central Forest Experiment Station, 48 p

Hawkins DM, Musser BJ (1999) One tree or a forest? Alternative dendrographic models. Comput Sci Stat 30:534–42

Hernandez JE, Epstein LD, Rodriguez MH, Rodriguez AD, Rejmankova E, Roberts DR (1997) Use of generalized regression tree models to characterize vegetation favoring Anopheles albimanus breeding. J Am Mosq Control Assoc 13(1):28–34

Higgins SI, Lavorel S, Revilla EE (2003) Estimating plant migration rates under habitat loss and fragmentation. Oikos 101:354–66

Hobbs RJ (1994) Dynamics of vegetation mosaics: can we predict responses to global change? Ecoscience 1(4):346–56

Hothorn T, Lausen B, Benner A, Radespiel-Troger M (2004) Bagging survival trees. Stat Med 23:77–91

Iverson LR, Prasad AM (1998) Predicting abundance of 80 tree species following climate change in the eastern United States. Ecol Mono 68:465–85

Iverson LR, Prasad AM (2002) Potential redistribution of tree species habitat under five climate change scenarios in the eastern US. For Ecol Manage 155(1–3):205–22

Iverson LR, Prasad AM, Hale BJ, Sutherland EK 1999a. An atlas of current and potential future distributions of common trees of the eastern United States. General technical report NE-265. Northeastern Research Station, USDA Forest Service, 245 p

Iverson LR, Prasad AM, Schwartz MW (1999b) Modeling potential future individual tree-species distributions in the Eastern United States under a climate change scenario: a case study with Pinus virginiana. Ecol Mod 115:77–93

Kittel TGF, Rosenbloom NA, Kaufman C, Royle JA, Daly C, Fisher HH, and others. 2000. VEMAP phase 2 historical and future scenario climate database. Oak Ridge (TN): ORNL Distributed Active Archive Center, Oak Ridge National Laboratory. [online] URL: http://www.daac.ornl.gov/

Lees BG, Ritman K (1991) Decision-tree and rule-induction approach to integration of remotely sensed and GIS data in mapping vegetation in disturbed or hilly environments. Environ Manage 15:823–31

Liaw A, Wiener M. 2002. Classification and regression by Random Forests. R News, 2/3:18–22. [online] URL http://www.CRAN.R-project.org/doc/Rnews/

Little EL. 1971. Atlas of United States trees; vol 1. Conifers and important hardwoods. Miscellaneous publication 1146. Washington (DC), US Department of Agriculture, Forest Service, 200 p

Little EL. 1977. Atlas of United States Trees; vol 4. Minor eastern hardwoods. Miscellaneous publication 1342. Washington (DC): US Department of Agriculture, Forest Service, 230 p

Malcolm JR, Markham A, Neilson RP, Garaci M (2002) Estimated migration rates under scenarios of global climate change. J Biogeogr 29:835–49

Map Comparison Kit. 2003. Research Institute for Knowledge Systems, Netherlands. http://www.riks.nl

Meyer D, Leisch F, Hornik K (2003) The support vector machine under test. Neurocomputing 55:59–71

Michaelsen J, Schimel DS, Friedl MA, Davis FW, Dubayah RC (1994) Regression tree analysis of satellite and terrain data to guide vegetation sampling and surveys. J Veg Sci 5:673–86

Miller JR, Turner MG, Smithwick EAH, Dent CL, Stanley EH (2004. Spatial extrapolation: the science of predicting ecological patterns and processes. BioScience 54(4):310–20

Moisen GG, Frescino T (2002) Comparing five modelling techniques for predicting forest characteristics. Ecol Model 157:209–25

Monserud RA, Leemans R (1992) Comparing global vegetation maps with the Kappa statistic. Ecol Model 62:275–93

Moore DE, Lees BG, Davey SM (1991) A new method for predicting vegetation distributions using decision tree analysis in a geographic information system. J Environ Manage 15:59–71

Munoz J, Felicisimo AM (2004) Comparison of statistical methods commonly used in predictive modelling. J Veg Sci 15:285–92

Peters A, Hothorn T, Lausen B. 2002. ipred: Improved predictors. R News, 2(2):22–6 [online] URL http://www.CRAN.R-project.org/doc/Rnews/

Pitelka LF, Plant Migration Workshop Group. 1997. Plant migration and climate change. Am Sci 85:464–73

Pontius RG Jr (2000) Quantification error versus location error in comparison of categorical maps. Photogram Eng Remote Sens 66(8):1011–16

Power C, Simms A (2001) Hierarchical fuzzy pattern matching for regional comparison of land use maps. Int J Geogr Inf Sci 15(1):77–100

Prasad AM, Iverson LR. 2000a. A climate change atlas for 80 forest tree species of the eastern United States [database]. [online] URL: http://www.fs.fed.us/ne/delaware/atlas

Prasad AM, Iverson LR. 2000b. Predictive vegetation mapping using a custom built model-chooser: comparison of regression tree analysis and multivariate adaptive regression splines. In: Proceedings CD-ROM. 4th International Conference on Integrating GIS and Environmental Modeling: Problems, Prospects and Research Needs. Banff, Alberta, Canada. [online] URL: http://www.colorado.edu/research/cires/banff/upload/159/index.html

Prasad AM, Iverson LR. 2003. Little’s range and FIA importance value database for 135 eastern US tree species. Northeastern Research Station, USDA Forest Service, Delaware, Ohio. [online] URL: http://www.fs.fed.us/ne/delaware/4153/global/littlefia/index.html

R Development Core Team. 2004. R: a language and environment for statistical computing. Vienna (Austria): R Foundation for Statistical Computing. [online] URL: http://www. R-project.org

Reichard SH, Hamilton CW (1997) Predicting invasion of woody plants introduced into North America. Conserv Biol 11:193–203

Schapire RE, Freund Y, Barlett P, Lee W (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5):1651–86

Schwartz MW, Iverson LR, Prasad AM (2001) Predicting the potential future distribution of four tree species in Ohio, USA, using current habitat availability and climatic forcing. Ecosystems 4:568–81

Skurichina M, Duin RPW (2002) Bagging, boosting and the random subspace method for linear classifiers. Pattern Anal Appl 5:121–35

Steinberg D, Colla PL, Martin K (1999) MARS user guide. Salford Systems, San Diego (CA)

Stoppiana D, Gregoire J-M, Pereira JMC (2003) The use of SPOT VEGETATION data in a classification tree appproach for burnt area mapping in Australian savanna. Int J Remote Sens 24:2131–51

Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random Forests: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 43(6):1947–58

Therneau TM, Atkinson EJ (1997) An introduction to recursive partitioning using the RPART routines. Technical report no. 61. Mayo Clinic, Rochester (MM) p 52

Verbyla DL (1987) Classification trees: a new discrimination tool. Can J For Res 17:1150–52