SIGMOD Record

  0163-5808

  1943-5835

  Mỹ

Cơ quản chủ quản:  ASSOC COMPUTING MACHINERY , Association for Computing Machinery (ACM)

Lĩnh vực:
SoftwareInformation Systems

Các bài báo tiêu biểu

Data mining
Tập 31 Số 1 - Trang 76-77 - 2002
Ian H. Witten, Eibe Frank
Mining association rules between sets of items in large databases
Tập 22 Số 2 - Trang 207-216 - 1993
Rakesh Agrawal, Tomasz Imieliński, Arun Swami

We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. The algorithm incorporates buffer management and novel estimation and pruning techniques. We also present results of applying this algorithm to sales data obtained from a large retailing company, which shows the effectiveness of the algorithm.

The cougar approach to in-network query processing in sensor networks
Tập 31 Số 3 - Trang 9-18 - 2002
Yong Yao, Johannes Gehrke

The widespread distribution and availability of small-scale sensors, actuators, and embedded processors is transforming the physical world into a computing platform. One such example is a sensor network consisting of a large number of sensor nodes that combine physical sensing capabilities such as temperature, light, or seismic sensors with networking and computation capabilities. Applications range from environmental control, warehouse inventory, and health care to military environments. Existing sensor networks assume that the sensors are preprogrammed and send data to a central frontend where the data is aggregated and stored for offline querying and analysis. This approach has two major drawbacks. First, the user cannot change the behavior of the system on the fly. Second, conservation of battery power is a major design factor, but a central system cannot make use of in-network programming, which trades costly communication for cheap local computation.In this paper, we introduce the Cougar approach to tasking sensor networks through declarative queries. Given a user query, a query optimizer generates an efficient query plan for in-network query processing, which can vastly reduce resource usage and thus extend the lifetime of a sensor network. In addition, since queries are asked in a declarative language, the user is shielded from the physical characteristics of the network. We give a short overview of sensor networks, propose a natural architecture for a data management system for sensor networks, and describe open research problems in this area.

Information diffusion in online social networks
Tập 42 Số 2 - Trang 17-28 - 2013
Adrien Guille, Hakim Hacid, Cécile Favre, Djamel A. Zighed

Online social networks play a major role in the spread of information at very large scale. A lot of effort have been made in order to understand this phenomenon, ranging from popular topic detection to information diffusion modeling, including influential spreaders identification. In this article, we present a survey of representative methods dealing with these issues and propose a taxonomy that summarizes the state-of-the-art. The objective is to provide a comprehensive analysis and guide of existing efforts around information diffusion in social networks. This survey is intended to help researchers in quickly understanding existing works and possible improvements to bring.

Cluster validity methods
Tập 31 Số 2 - Trang 40-45 - 2002
Maria Halkidi, Yannis Batistakis, Michalis Vazirgiannis

Clustering is an unsupervised process since there are no predefined classes and no examples that would indicate grouping properties in the data set. The majority of the clustering algorithms behave differently depending on the features of the data set and the initial assumptions for defining groups. Therefore, in most applications the resulting clustering scheme requires some sort of evaluation as regards its validity. Evaluating and assessing the results of a clustering algorithm is the main subject of cluster validity. In this paper we present a review of the clustering validity and methods. More specifically, Part I of the paper discusses the cluster validity approaches based on external and internal criteria.

Clustering validity checking methods
Tập 31 Số 3 - Trang 19-27 - 2002
Maria Halkidi, Yannis Batistakis, Michalis Vazirgiannis

Clustering results validation is an important topic in the context of pattern recognition. We review approaches and systems in this context. In the first part of this paper we presented clustering validity checking approaches based on internal and external criteria. In the second, current part, we present a review of clustering validity approaches based on relative criteria. Also we discuss the results of an experimental study based on widely known validity indices. Finally the paper illustrates the issues that are under-addressed by the recent approaches and proposes the research directions in the field.

A case for intelligent disks (IDISKs)
Tập 27 Số 3 - Trang 42-52 - 1998
Kimberly Keeton, David A. Patterson, Joseph M. Hellerstein

Decision support systems (DSS) and data warehousing workloads comprise an increasing fraction of the database market today. I/O capacity and associated processing requirements for DSS workloads are increasing at a rapid rate, doubling roughly every nine to twelve months [38]. In response to this increasing storage and computational demand, we present a computer architecture for decision support database servers that utilizes “intelligent” disks (IDISKs). IDISKs utilize low-cost embedded general-purpose processing, main memory, and high-speed serial communication links on each disk. IDISKs are connected to each other via these serial links and high-speed crossbar switches, overcoming the I/O bus bottleneck of conventional systems. By off-loading computation from expensive desktop processors, IDISK systems may improve cost-performance. More importantly, the IDISK architecture allows the processing of the system to scale with increasing storage demand.

Query languages for graph databases
Tập 41 Số 1 - Trang 50-60 - 2012
Peter T. Wood

Query languages for graph databases started to be investigated some 25 years ago. With much current data, such as linked data on the Web and social network data, being graph-structured, there has been a recent resurgence in interest in graph query languages. We provide a brief survey of many of the graph query languages that have been proposed, focussing on the core functionality provided in these languages. We also consider issues such as expressive power and the computational complexity of query evaluation.

Improved histograms for selectivity estimation of range predicates
Tập 25 Số 2 - Trang 294-305 - 1996
Viswanath Poosala, Peter J. Haas, Yannis Ioannidis, Eugene J. Shekita

Many commercial database systems maintain histograms to summarize the contents of relations and permit efficient estimation of query result sizes and access plan costs. Although several types of histograms have been proposed in the past, there has never been a systematic study of all histogram aspects, the available choices for each aspect, and the impact of such choices on histogram effectiveness. In this paper, we provide a taxonomy of histograms that captures all previously proposed histogram types and indicates many new possibilities. We introduce novel choices for several of the taxonomy dimensions, and derive new histogram types by combining choices in effective ways. We also show how sampling techniques can be used to reduce the cost of histogram construction. Finally, we present results from an empirical study of the proposed histogram types used in selectivity estimation of range predicates and identify the histogram types that have the best overall performance.

Space-efficient online computation of quantile summaries
Tập 30 Số 2 - Trang 58-66 - 2001
Michael B. Greenwald, Sanjeev Khanna

An ∈-approximate quantile summary of a sequence of N elements is a data structure that can answer quantile queries about the sequence to within a precision of ∈ N .

We present a new online algorithm for computing∈-approximate quantile summaries of very large data sequences. The algorithm has a worst-case space requirement of Ο (1÷∈ log(∈ N )). This improves upon the previous best result of Ο (1÷∈ log 2 (∈ N )). Moreover, in contrast to earlier deterministic algorithms, our algorithm does not require a priori knowledge of the length of the input sequence.

Finally, the actual space bounds obtained on experimental data are significantly better than the worst case guarantees of our algorithm as well as the observed space requirements of earlier algorithms.