Mapping Large Spatial Flow Data with Hierarchical Clustering

Transactions in GIS - Tập 18 Số 3 - Trang 421-435 - 2014
Xi Zhu1,2, Diansheng Guo1
1Department of Geography; University of South Carolina
2School of Hydropower and Information Engineering; Huazhong University of Science and Technology

Tóm tắt

Abstract

It is challenging to map large spatial flow data due to the problem of occlusion and cluttered display, where hundreds of thousands of flows overlap and intersect each other. Existing flow mapping approaches often aggregate flows using predetermined high‐level geographic units (e.g. states) or bundling partial flow lines that are close in space, both of which cause a significant loss or distortion of information and may miss major patterns. In this research, we developed a flow clustering method that extracts clusters of similar flows to avoid the cluttering problem, reveal abstracted flow patterns, and meanwhile preserves data resolution as much as possible. Specifically, our method extends the traditional hierarchical clustering method to aggregate and map large flow data. The new method considers both origins and destinations in determining the similarity of two flows, which ensures that a flow cluster represents flows from similar origins to similar destinations and thus minimizes information loss during aggregation. With the spatial index and search algorithm, the new method is scalable to large flow data sets. As a hierarchical method, it generalizes flows to different hierarchical levels and has the potential to support multi‐resolution flow mapping. Different distance definitions can be incorporated to adapt to uneven spatial distribution of flows and detect flow clusters of different densities. To assess the quality and fidelity of flow clusters and flow maps, we carry out a case study to analyze a data set of 243,850 taxi trips within an urban area.

Từ khóa


Tài liệu tham khảo

AgrawalR GehrkeJ GunopulosD andRaghavanP1998Automatic subspace clustering of high dimensional data for data mining applications. InProceedings of the ACM SIGMOD Conference on Management of Data Seattle Washington:94–105

10.1179/000870409X12525737905042

10.1109/TVCG.2010.44

10.1111/j.1435-5957.2008.00164.x

10.1109/TVCG.2008.135

EsterM KriegelH‐P SanderJ andXuX1996A density‐based algorithm for discovering clusters in large spatial databases with noise. InProceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD‐96) Portland Oregon 226–231

10.2307/621372

Groff E, 2006, Exploring the Spatial Configuration of Places Related to Homicide Events

10.1145/276305.276312

10.1080/13658810701349037

10.1109/TVCG.2009.143

10.1109/TVCG.2006.84

10.1023/A:1025101015202

10.1111/j.1467-9671.2012.01344.x

Han J, 2000, Data Mining: Concepts and Techniques

10.4324/9780203468029_chapter_8

10.1111/j.1467-8659.2009.01450.x

10.1145/331499.331504

10.1109/T-C.1973.223640

10.1007/s10109-008-0069-1

NgR TandHanJ1994Efficient and effective clustering methods for spatial data mining. InProceedings of the Twentieth International Conference on Very Large Databases Santiago Chile

Openshaw S, 1983, The Modifiable Areal Unit Problem

PhanD XiaoL YehR HanrahanP andTerryW2005Flow map layout. InProceedings of the IEEE Symposium on Information Visualization (InfoVis 2005) Minneapolis Minnesota

10.1057/palgrave.ivs.9500183

10.1111/j.1538-4632.1981.tb00711.x

10.1559/152304087783875273

10.1111/j.1435-5597.1960.tb01712.x

10.1109/TVCG.2011.202

10.1179/000870410X12658023467367

Yan J, 2008, Self‐Organizing Maps: Applications in Geographic Information Science, 67, 10.1002/9780470021699.ch4

10.1068/b34019

10.1145/235968.233324