ACM Computing Surveys
Công bố khoa học tiêu biểu
* Dữ liệu chỉ mang tính chất tham khảo
In the last few years we have observed a proliferation of approaches for clustering XML documents and schemas based on their structure and content. The presence of such a huge amount of approaches is due to the different applications requiring the clustering of XML data. These applications need data in the form of similar contents, tags, paths, structures, and semantics. In this article, we first outline the application contexts in which clustering is useful, then we survey approaches so far proposed relying on the abstract representation of data (instances or schema), on the identified similarity measure, and on the clustering algorithm. In this presentation, we aim to draw a taxonomy in which the current approaches can be classified and compared. We aim at introducing an integrated view that is useful when comparing XML data clustering approaches, when developing a new clustering algorithm, and when implementing an XML clustering component. Finally, the article moves into the description of future trends and research issues that still need to be faced.
Các thước đo tính thú vị đóng một vai trò quan trọng trong khai thác dữ liệu, bất kể loại mẫu nào đang được khai thác. Những thước đo này nhằm mục đích chọn lọc và xếp hạng các mẫu dựa trên mức độ quan tâm tiềm năng của người dùng. Các thước đo tốt cũng cho phép giảm thiểu chi phí về thời gian và không gian trong quá trình khai thác. Bài khảo sát này xem xét các thước đo tính thú vị cho quy tắc và tóm tắt, phân loại chúng theo nhiều góc độ khác nhau, so sánh các thuộc tính của chúng, xác định vai trò của chúng trong quá trình khai thác dữ liệu, đưa ra các chiến lược để chọn thước đo phù hợp cho các ứng dụng và xác định các cơ hội cho nghiên cứu trong tương lai trong lĩnh vực này.
This survey characterizes an emerging research area, sometimes called
A key insight of the framework presented here is that coordination can be seen as the process of
Section 3 summarizes ways of applying a coordination perspective in three different domains:(1) understanding the effects of information technology on human organizations and markets, (2) designing cooperative work tools, and (3) designing distributed and parallel computer systems. In the final section, elements of a research agenda in this new area are briefly outlined.
The design and analysis of adaptive sorting algorithms has made important contributions to both theory and practice. The main contributions from the theoretical point of view are: the description of the complexity of a sorting algorithm not only in terms of the size of a problem instance but also in terms of the disorder of the given problem instance; the establishment of new relationships among measures of disorder; the introduction of new sorting algorithms that take advantage of the existing order in the input sequence; and, the proofs that several of the new sorting algorithms achieve maximal (optimal) adaptivity with respect to several measures of disorder. The main contributions from the practical point of view are: the demonstration that several algorithms currently in use are adaptive; and, the development of new algorithms, similar to currently used algorithms that perform competitively on random sequences and are significantly faster on nearly sorted sequences. In this survey, we present the basic notions and concepts of adaptive sorting and the state of the art of adaptive sorting algorithms.
Recent demands for storing and querying big data have revealed various shortcomings of traditional relational database systems. This, in turn, has led to the emergence of a new kind of complementary nonrelational data store, named as NoSQL. This survey mainly aims at elucidating the design decisions of NoSQL stores with regard to the four nonorthogonal design principles of distributed database systems: data model, consistency model, data partitioning, and the CAP theorem. For each principle, its available strategies and corresponding features, strengths, and drawbacks are explained. Furthermore, various implementations of each strategy are exemplified and crystallized through a collection of representative academic and industrial NoSQL technologies. Finally, we disclose some existing challenges in developing effective NoSQL stores, which need attention of the research community, application designers, and architects.
The variety of data is one of the most challenging issues for the research and practice in data management systems. The data are naturally organized in different formats and models, including structured data, semi-structured data, and unstructured data. In this survey, we introduce the area of multi-model DBMSs that build a single database platform to manage multi-model data. Even though multi-model databases are a newly emerging area, in recent years, we have witnessed many database systems to embrace this category. We provide a general classification and multi-dimensional comparisons for the most popular multi-model databases. This comprehensive introduction on existing approaches and open problems, from the technique and application perspective, make this survey useful for motivating new multi-model database approaches, as well as serving as a technical reference for developing multi-model database applications.
Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate the problem: In order to manipulate large sets of complex objects as efficiently as today's database systems manipulate simple records, query-processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software.
This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
- 1
- 2
- 3
- 4
- 5
- 6
- 8