Controlling patterns of geospatial phenomena

Springer Science and Business Media LLC - Tập 15 - Trang 399-416 - 2010
Tomasz F. Stepinski1, Wei Ding2, Christoph F. Eick3
1Lunar and Planetary Institute, Houston , USA
2Department of Computer Science, University of Massachusetts Boston, Boston, USA
3Department of Computer Science, University of Houston, Houston, USA

Tóm tắt

Modeling spatially distributed phenomena in terms of its controlling factors is a recurring problem in geoscience. Most efforts concentrate on predicting the value of response variable in terms of controlling variables either through a physical model or a regression model. However, many geospatial systems comprises complex, nonlinear, and spatially non-uniform relationships, making it difficult to even formulate a viable model. This paper focuses on spatial partitioning of controlling variables that are attributed to a particular range of a response variable. Thus, the presented method surveys spatially distributed relationships between predictors and response. The method is based on association analysis technique of identifying emerging patterns, which are extended in order to be applied more effectively to geospatial data sets. The outcome of the method is a list of spatial footprints, each characterized by a unique “controlling pattern”—a list of specific values of predictors that locally correlate with a specified value of response variable. Mapping the controlling footprints reveals geographic regionalization of relationship between predictors and response. The data mining underpinnings of the method are given and its application to a real world problem is demonstrated using an expository example focusing on determining variety of environmental associations of high vegetation density across the continental United States.

Tài liệu tham khảo

Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the 1993 ACM SIGMOD international conference on management of data. Washington, D.C., pp 26–28, 207–216 Boulesteix AL, Tutz G, Strimmer K (2003) A cart-based approach to discover emerging patterns in microarray data. Bioinformatics 19(18):2465–2472 Burdick D, Calimlim M, Gehrke J (2001) Mafia: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th international conference on data engineering. Heidelberg, Germany Ceci M, Appice A, Malerba D (2007) Discovering emerging patterns in spatial databases: a multi-relational approach. In: Knowledge discovery in databases: PKDD 2007, series: lecture notes in artificial intelligence, vol 4702. Springer, Berlin, pp 390–397 Cormode G, Muthukrishnan S (2004) What’s new: finding significant differences in network data streams. In: IEEE INFOCOM Cressie, NA (1993) Statistics for spatial data. Wiley, New York Ding W, Stepinski TF, Parmar R, Jiang D, Eick CF (2009) Discovery of feature-based hot spots using supervised clustering. Comput Geosci 35:1508–1516 Ding W, Stepinski TF, Salazar, J (2009) Discovery of geospatial discriminating patterns from remote sensing datasets. In: SIAM international conference on data mining (SDM), Nevada, April 2009 Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: KDD ’99: proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. San Diego, California, United States Korkalainen T, Lauren A (2006) Using phytogeomorphology, cartography and GIS to explain forest site productivity expressed as tree height in southern and central Finland. Geomorphology 74:271–284 Larsen DR, Speckman, PL (2004) Multivariate regression trees for analysis of abundance data. Biometrics, 60(2):543–549 Li J, Wong L (2005) Structural geography of the space of emerging patterns. Intelligent Data Analysis 9(6):567–588 Li J, Yang Q (2007) Strong compound-risk factors: efficient discovery through emerging patterns and contrast sets. IEEE Trans Inf Technol Biomed 11:544–552 Li J, Liu H, S-K Ng, Wong L (2003) Discovery of significant rules for classifying cancer diagnosis data. Bioinformatics 19:ii93–ii102 Liaghati T, Preda M, Cox M (2003) Heavy metal distribution and controlling factors within coastal plain sediments, Bells Creek catchment, southeast Queensland, Australia. Environ Int 29:935–948 Lobell, J. I. Ortiz-Monasterio, Asner GP, Naylor RL, Falcon WP (2005) Combining field surveys, remote sensing, and regression trees to understand yield variations in an irrigated wheat landscape. Agron J 97:241–249 Munkres J (1999) Topology, 2nd edn. Prentice Hall, Upper Saddle River Navas A, Machín J (2002) Spatial distribution of heavy metals and arsenic in soils of Aragón (northeast Spain): controlling factors and environmental implications. Appl Geochem 17:961–973 ORNL (2009) Oak Ridge National Laboratory distributed active archive center data holdings. Podraza R, Tomaszewski K (2005) KTDA: emerging patterns based data analysis system. In: XXI fall meeting of polish information processing society, pp 213–221 PRISM (2009) PRISM (parameter-elevation regressions on independent slopes model) climate mapping system products matrix. PRISM, Corvallis Remmel TK, Csillag, F (2006) Mutual information spectra for comparing categorical maps. Int J Remote Sens 27:1425–1452 Rousseeuw J, Croux C (1993) Alternatives to the median absolute deviation. J Am Stat Assoc 88:1273–1283 Rusjan S, Mikos, M (2008) Assessment of hydrological and seasonal controls over the nitrate flushing from a forested watershed using a data mining technique. Hydrol Earth Syst Sci 12:645–656 Seamless (2009) National map seamless server. USGS, Denver Steegen A, Govers G, Takkena I, Nachtergaelea J, Poesena J, Merckxb R (2001) Factors controlling sediment and phosphorus export from two Belgian agricultural catchments. J Environ Qual 30:1249–1258 Stepinski T, Ding W, Eick C (2008) Discovering controlling factors of geospatial variables. In: The 16th ACM SIGSPATIAL international conference on advances in geographic information systems (ACM GIS 2008). Irvine, CA, USA, pp 1–4 Wang X, Qin Y (2005) Spatial distribution of metals in urban topsoils of Xuzhou (China): controlling factors and environmental implications. Environ Geol 49(6):905–914 White D, Sifneos JC (2002) Regression tree cartography. J Comput Graph Stat 11(3):600–614 Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn (Morgan Kaufmann series in data management systems). Morgan Kaufmann, San Francisco