SIGMOD Record
Công bố khoa học tiêu biểu
* Dữ liệu chỉ mang tính chất tham khảo
Since we last wrote about SQL/XML in [2], the first edition of that new part of the SQL standard has been officially published as an international standard [1], commonly called SQL/XML:2003. At the time of that earlier column, SQL/XML was just entering its first official ballot, meaning that (possibly significant) changes to the text were expected in response to ballot comments submitted by the various participants in the SQL standardization process.
This note presents a translation of a subset of the relational query language SQL into the well known tuple calculus. Roughly speaking, tuple calculus corresponds to first order predicate calculus. The SQL subset is relationally complete and represents a “relational core” of the language. Nevertheless, our translation is simple and elegant. Therefore it is especially well suited as a beginners course into the principles of a formal definition of SQL.
A new class of database management systems (DBMSs) called NewSQL tout their ability to scale modern on-line transaction processing (OLTP) workloads in a way that is not possible with legacy systems. The term NewSQL was first used by one of the authors of this article in a 2011 business analysis report discussing the rise of new database systems as challengers to these established vendors (Oracle, IBM, Microsoft). The other author was working on what became one of the first examples of a NewSQL DBMS. Since then several companies and research projects have used this term (rightly and wrongly) to describe their systems.
Given that relational DBMSs have been around for over four decades, it is justifiable to ask whether the claim of NewSQL's superiority is actually true or whether it is simply marketing. If they are indeed able to get better performance, then the next question is whether there is anything scientifically new about them that enables them to achieve these gains or is it just that hardware has advanced so much that now the bottlenecks from earlier years are no longer a problem.
To do this, we first discuss the history of databases to understand how NewSQL systems came about. We then provide a detailed explanation of what the term NewSQL means and the different categories of systems that fall under this definition.
This paper examines Jim Gray's role in the specification of the debit/credit benchmark. The publication of this benchmark in a 1985 paper launched a benchmark war among the vendors that resulted in dramatic improvements in database system performance in the years following its publication. It was the genesis of the TPC, an industry consortium which has reshaped the benchmark landscape. Descendents of this benchmark continue to this day to be an important metric of modern transaction processing systems.
This note presents a translation of a subset of the relational query language SQL into the well known tuple calculus. Roughly speaking, tuple calculus corresponds to first order predicate calculus. The SQL subset is relationally complete and represents a “relational core” of the language. Nevertheless, our translation is simple and elegant. Therefore it is especially well suited as a beginners course into the principles of a formal definition of SQL.
We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. The algorithm incorporates buffer management and novel estimation and pruning techniques. We also present results of applying this algorithm to sales data obtained from a large retailing company, which shows the effectiveness of the algorithm.
Many commercial database systems maintain histograms to summarize the contents of relations and permit efficient estimation of query result sizes and access plan costs. Although several types of histograms have been proposed in the past, there has never been a systematic study of all histogram aspects, the available choices for each aspect, and the impact of such choices on histogram effectiveness. In this paper, we provide a taxonomy of histograms that captures all previously proposed histogram types and indicates many new possibilities. We introduce novel choices for several of the taxonomy dimensions, and derive new histogram types by combining choices in effective ways. We also show how sampling techniques can be used to reduce the cost of histogram construction. Finally, we present results from an empirical study of the proposed histogram types used in selectivity estimation of range predicates and identify the histogram types that have the best overall performance.
Object-oriented programming is well-suited to such data-intensive application domains as CAD/CAM, AI, and OIS (office information systems) with multimedia documents. At MCC we have built a prototype object-oriented database system, called ORION. It adds persistence and sharability to objects created and manipulated in applications implemented in an object-oriented programming environment. One of the important requirements of these applications is schema evolution, that is, the ability to dynamically make a wide variety of changes to the database schema. In this paper, following a brief review of the object-oriented data model that we support in ORION, we establish a framework for supporting schema evolution, define the semantics of schema evolution, and discuss its implementation.
An ∈-approximate quantile summary of a sequence of
We present a new online algorithm for computing∈-approximate quantile summaries of very large data sequences. The algorithm has a worst-case space requirement of
Finally, the actual space bounds obtained on experimental data are significantly better than the worst case guarantees of our algorithm as well as the observed space requirements of earlier algorithms.
Clustering results validation is an important topic in the context of pattern recognition. We review approaches and systems in this context. In the first part of this paper we presented clustering validity checking approaches based on internal and external criteria. In the second, current part, we present a review of clustering validity approaches based on relative criteria. Also we discuss the results of an experimental study based on widely known validity indices. Finally the paper illustrates the issues that are under-addressed by the recent approaches and proposes the research directions in the field.
- 1
- 2