Accelerating high-dimensional nearest neighbor queries
Proceedings 14th International Conference on Scientific and Statistical Database Management - Trang 109-118
Tóm tắt
The performance of nearest neighbor (NN) queries degrades noticeably with increasing dimensionality of the data due to reduced selectivity of high-dimensional data and an increased number of seek operations during NN-query execution. If the NN-radii were known in advance, the disk accesses could be reordered such that seek operations are minimized. We therefore propose a new way of estimating the NN-radius based on the fractal dimensionality and sampling. It is applicable to any page-based index structure. We show that the estimation error is considerably lower than for previous approaches. In the second part of the paper, we present two applications of this technique. We show how the radius estimations can be used to transform k-NN queries into at most two range queries, and how it can be used to reduce the number of page reads during all-NN queries. In both cases, we observe significant speedups over traditional techniques for synthetic and real-world data.
Từ khóa
#Acceleration #Nearest neighbor searches #Neural networks #Sampling methods #Fractals #Image segmentation #Genomics #Bioinformatics #Databases #Computer scienceTài liệu tham khảo
10.1145/602259.602266
hjaltason, 1995, Ranking in spatial databases, A duaries in Spatial Databases – Fourth International Symposium (SSD), 83
10.1145/276304.276326
huang, 1997, Spatial joins using r-trees: Breadth-first traversal with global optimizations, Proceedings of the Int Conf on Very Large Data Bases, 396
korn, 2000, Deflating the dimensionality curse using multiple fractal dimensions, Proc Int Conf Data Engineering
10.1109/ICDE.1998.655809
10.1145/375663.375716
10.1109/SSDM.2002.1029711
10.1007/3-540-62222-5_59
ramakrishnan, 2000, Datab aseManagement Systems
10.1145/375663.375714
berc htold, 1996, The X-tree: An index structure for high-dimensional data, Proceedings of the Int Conf on Very Large Data Bases, 28
10.1145/170035.170075
10.1109/ICDE.2000.839418
donjerkovic, 1999, Probabilistic optimization of top n queries, VLDB, 411
chaudhuri, 1999, Evaluating top-k selection queries, VLDB, 397
berc htold, 2000, Independent quan tization: An index compression technique for high-dimensional data spaces, Proc Int Conf on Data Engineering, 10.1109/ICDE.2000.839456
belussi, 1995, Estimating the selectivity of spatial queries using the ‘correlation’ fractal dimension, Proceedings of the Int Conf on Very Large Data Bases, 299
10.1145/182591.182593
seeger, 1993, Reading a set of disk pages, Proceedings of the Int Conf on Very Large Data Bases, 592
weber, 1997, An approximation based data structure for similarit y search, Technical Report 24 ESPRIT Project HERMES (No 9141)