Accelerating high-dimensional nearest neighbor queries

Proceedings 14th International Conference on Scientific and Statistical Database Management - Trang 109-118

C.A. Lang¹, A.K. Singh¹

¹Department of Computer Science, University of California, Santa Barbara, CA, USA

Tóm tắt

The performance of nearest neighbor (NN) queries degrades noticeably with increasing dimensionality of the data due to reduced selectivity of high-dimensional data and an increased number of seek operations during NN-query execution. If the NN-radii were known in advance, the disk accesses could be reordered such that seek operations are minimized. We therefore propose a new way of estimating the NN-radius based on the fractal dimensionality and sampling. It is applicable to any page-based index structure. We show that the estimation error is considerably lower than for previous approaches. In the second part of the paper, we present two applications of this technique. We show how the radius estimations can be used to transform k-NN queries into at most two range queries, and how it can be used to reduce the number of page reads during all-NN queries. In both cases, we observe significant speedups over traditional techniques for synthetic and real-world data.

Từ khóa

#Acceleration #Nearest neighbor searches #Neural networks #Sampling methods #Fractals #Image segmentation #Genomics #Bioinformatics #Databases #Computer science

Tài liệu tham khảo

10.1145/602259.602266 hjaltason, 1995, Ranking in spatial databases, A duaries in Spatial Databases – Fourth International Symposium (SSD), 83 10.1145/276304.276326 huang, 1997, Spatial joins using r-trees: Breadth-first traversal with global optimizations, Proceedings of the Int Conf on Very Large Data Bases, 396 korn, 2000, Deflating the dimensionality curse using multiple fractal dimensions, Proc Int Conf Data Engineering 10.1109/ICDE.1998.655809 10.1145/375663.375716 10.1109/SSDM.2002.1029711 10.1007/3-540-62222-5_59 ramakrishnan, 2000, Datab aseManagement Systems 10.1145/375663.375714 berc htold, 1996, The X-tree: An index structure for high-dimensional data, Proceedings of the Int Conf on Very Large Data Bases, 28 10.1145/170035.170075 10.1109/ICDE.2000.839418 donjerkovic, 1999, Probabilistic optimization of top n queries, VLDB, 411 chaudhuri, 1999, Evaluating top-k selection queries, VLDB, 397 berc htold, 2000, Independent quan tization: An index compression technique for high-dimensional data spaces, Proc Int Conf on Data Engineering, 10.1109/ICDE.2000.839456 belussi, 1995, Estimating the selectivity of spatial queries using the ‘correlation’ fractal dimension, Proceedings of the Int Conf on Very Large Data Bases, 299 10.1145/182591.182593 seeger, 1993, Reading a set of disk pages, Proceedings of the Int Conf on Very Large Data Bases, 592 weber, 1997, An approximation based data structure for similarit y search, Technical Report 24 ESPRIT Project HERMES (No 9141)

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA