The VLDB Journal

Công bố khoa học tiêu biểu

* Dữ liệu chỉ mang tính chất tham khảo

Sắp xếp:  
Rule-based spatiotemporal query processing for video databases
The VLDB Journal - Tập 13 - Trang 86-103 - 2004
Mehmet Emin Dönderler, Özgür Ulusoy, Ugur Güdükbay
In our earlier work, we proposed an architecture for a Web-based video database management system (VDBMS) providing an integrated support for spatiotemporal and semantic queries. In this paper, we focus on the task of spatiotemporal query processing and also propose an SQL-like video query language that has the capability to handle a broad range of spatiotemporal queries. The language is rule-based in that it allows users to express spatial conditions in terms of Prolog-type predicates. Spatiotemporal query processing is carried out in three main stages: query recognition, query decomposition, and query execution.
Fast fully dynamic labelling for distance queries
The VLDB Journal - Tập 31 - Trang 483-506 - 2021
Muhammad Farhan, Qing Wang, Yu Lin, Brendan McKay
Finding the shortest-path distance between an arbitrary pair of vertices is a fundamental problem in graph theory. A tremendous amount of research has explored this problem, most of which is limited to static graphs. Due to the dynamic nature of real-world networks, such as social networks or web graphs in which a link between two entities may fail or become alive at any time, there is a pressing need to address this problem for dynamic networks. Existing work can only accommodate distance queries over moderately large dynamic networks due to high space cost and long pre-processing time required for constructing distance labelling, and even on such moderately large dynamic networks, distance labelling can hardly be updated efficiently. In this article, we propose a fully dynamic labelling method to efficiently update distance labelling so as to answer distance queries over large dynamic graphs. At its core, our proposed method incorporates two building blocks: (i) incremental algorithm for handling incremental update operations, i.e. edge insertions, and (ii) decremental algorithm for handling decremental update operations, i.e. edge deletions. These building blocks are built in a highly scalable framework of distance query answering. We theoretically prove the correctness of our fully dynamic labelling method and its preservation of the minimality of labelling. We have also evaluated on 13 real-world large complex networks to empirically verify the efficiency, scalability and robustness of our method.
Label-constrained shortest path query processing on road networks
The VLDB Journal -
Junhua Zhang, Long Yuan, Wentao Li, Lu Qin, Ying Zhang, Wenjie Zhang
RailwayDB: adaptive storage of interaction graphs
The VLDB Journal - Tập 25 - Trang 151-169 - 2015
Robert Soulé, Buğra Gedik
We are living in an ever more connected world, where data recording the interactions between people, software systems, and the physical world is becoming increasingly prevalent. These data often take the form of a temporally evolving graph, where entities are the vertices and the interactions between them are the edges. We call such graphs interaction graphs. Various domains, including telecommunications, transportation, and social media, depend on analytics performed on interaction graphs. The ability to efficiently support historical analysis over interaction graphs requires effective solutions for the problem of data layout on disk. This paper presents an adaptive disk layout called the railway layout for optimizing disk block storage for interaction graphs. The key idea is to divide blocks into one or more sub-blocks. Each sub-block contains the entire graph structure, but only a subset of the attributes. This improves query I/O, at the cost of increased storage overhead. We introduce optimal integer linear program (ILP) formulations for partitioning disk blocks into sub-blocks with overlapping and nonoverlapping attributes. Additionally, we present greedy heuristics that can scale better compared to the ILP alternatives, yet achieve close to optimal query I/O. We provide an implementation of the railway layout as part of RailwayDB—an open-source graph database we have developed. To demonstrate the benefits of the railway layout, we provide an extensive experimental evaluation, including model-based as well as empirical results comparing our approach to baseline alternatives.
Special Issue: Modern Hardware
The VLDB Journal - Tập 25 - Trang 623-624 - 2016
Peter Boncz, Wolfgang Lehner, Thomas Neumann
Diversifying recommendations on sequences of sets
The VLDB Journal - Tập 32 - Trang 283-304 - 2022
Sepideh Nikookar, Mohammadreza Esfandiari, Ria Mae Borromeo, Paras Sakharkar, Sihem Amer-Yahia, Senjuti Basu Roy
Diversifying recommendations on a sequence of sets (or sessions) of items captures a variety of applications. Notable examples include recommending online music playlists, where a session is a channel and multiple channels are listened to in sequence, or recommending tasks in crowdsourcing, where a session is a set of tasks and multiple task sessions are completed in sequence. Item diversity can be defined in more than one way, e.g., as a genre diversity for music, or as a function of reward in crowdsourcing. A user who engages in multiple sessions may intend to experience diversity within and/or across sessions. Intra session diversity is set-based, whereas Inter session diversity is naturally sequence-based. This novel formulation gives rise to four bi-objective problems with the goal of minimizing or maximizing Inter and Intra diversities. We prove hardness and develop efficient algorithms with theoretical guarantees. Our experiments with human subjects on two real datasets show that our diversity formulations do serve different user needs and yield high user satisfaction. Our large-scale experiments on real and synthetic data empirically demonstrate that our solutions satisfy our theoretical bounds and are highly scalable, compared to baselines.
Non-binary evaluation measures for big data integration
The VLDB Journal - - 2017
Tomer Sagi, Avigdor Gal
The evolution of data accumulation, management, analytics, and visualization has led to the coining of the term big data, which challenges the task of data integration. This task, common to any matching problem in computer science involves generating alignments between structured data in an automated fashion. Historically, set-based measures, based upon binary similarity matrices (match/non-match), have dominated evaluation practices of matching tasks. However, in the presence of big data, such measures no longer suffice. In this work, we propose evaluation methods for non-binary matrices as well. Non-binary evaluation is formally defined together with several new, non-binary measures using a vector space representation of matching outcome. We provide empirical analyses of the usefulness of non-binary evaluation and show its superiority over its binary counterparts in several problem domains.
View matching for outer-join views
The VLDB Journal - Tập 16 - Trang 29-53 - 2006
Per-Åke Larson, Jingren Zhou
Prior work on computing queries from materialized views has focused on views defined by expressions consisting of selection, projection, and inner joins, with an optional aggregation on top (SPJG views). This paper provides a view matching algorithm for views that may also contain outer joins (SPOJG views). The algorithm relies on a normal form for outer-join expressions and is not based on bottom-up syntactic matching of expressions. It handles any combination of inner and outer joins, deals correctly with SQL bag semantics, and exploits not-null constraints, uniqueness constraints and foreign key constraints.
Ước lượng tính chọn lọc nhất quán thông qua entropy tối đa Dịch bởi AI
The VLDB Journal - Tập 16 - Trang 55-76 - 2006
V. Markl, P. J. Haas, M. Kutsch, N. Megiddo, U. Srivastava, T. M. Tran
Các bộ tối ưu truy vấn dựa trên chi phí cần ước lượng tính chọn lọc của các phép toán liên hợp khi so sánh các kế hoạch thực thi truy vấn thay thế. Để đạt được điều này, các bộ tối ưu nâng cao sử dụng thống kê đa biến để cải thiện thông tin về phân phối đồng thời của các giá trị thuộc tính trong một bảng. Phân phối đồng thời cho tất cả các cột gần như luôn quá lớn để lưu trữ hoàn toàn, và việc sử dụng thông tin phân phối một phần dẫn đến khả năng có thể có nhiều ước lượng tính chọn lọc không tương đương cho một phép toán nhất định. Các bộ tối ưu hiện tại sử dụng các phương pháp ad hoc khó sử dụng để đảm bảo rằng các ước lượng được thực hiện một cách nhất quán. Những phương pháp này bỏ qua thông tin quý giá và có xu hướng thiên lệch bộ tối ưu về các kế hoạch truy vấn mà trong đó thông tin ít nhất có sẵn, thường dẫn đến kết quả kém. Trong bài báo này, chúng tôi trình bày một phương pháp mới cho việc ước lượng tính chọn lọc nhất quán dựa trên nguyên lý entropy tối đa (ME). Phương pháp của chúng tôi khai thác tất cả thông tin có sẵn và tránh vấn đề thiên lệch. Trong trường hợp thiếu thông tin chi tiết, phương pháp ME giảm về các giả định đồng nhất và độc lập tiêu chuẩn. Các thí nghiệm với việc triển khai thử nghiệm của chúng tôi trên DB2 UDB cho thấy việc sử dụng phương pháp ME có thể cải thiện ước lượng số lượng của bộ tối ưu lên nhiều bậc, dẫn đến chất lượng kế hoạch tốt hơn và thời gian thực thi truy vấn giảm đáng kể. Đối với hầu hết các truy vấn, những cải tiến này đạt được trong khi chỉ thêm vài chục mili giây vào tổng thời gian cần thiết cho việc tối ưu truy vấn.
Profiling relational data: a survey
The VLDB Journal - Tập 24 - Trang 557-581 - 2015
Ziawasch Abedjan, Lukasz Golab, Felix Naumann
Profiling data to determine metadata about a given dataset is an important and frequent activity of any IT professional and researcher and is necessary for various use-cases. It encompasses a vast array of methods to examine datasets and produce metadata. Among the simpler results are statistics, such as the number of null values and distinct values in a column, its data type, or the most frequent patterns of its data values. Metadata that are more difficult to compute involve multiple columns, namely correlations, unique column combinations, functional dependencies, and inclusion dependencies. Further techniques detect conditional properties of the dataset at hand. This survey provides a classification of data profiling tasks and comprehensively reviews the state of the art for each class. In addition, we review data profiling tools and systems from research and industry. We conclude with an outlook on the future of data profiling beyond traditional profiling tasks and beyond relational databases.
Tổng số: 788   
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 10