Springer Science and Business Media LLC
Công bố khoa học tiêu biểu
* Dữ liệu chỉ mang tính chất tham khảo
Sắp xếp:
On Density-Based Data Streams Clustering Algorithms: A Survey
Springer Science and Business Media LLC - Tập 29 Số 1 - Trang 116-141 - 2014
Clustering data streams has drawn lots of attention in the last few years due to their ever-growing presence. Data streams put additional challenges on clustering such as limited time and memory and one pass clustering. Furthermore, discovering clusters with arbitrary shapes is very important in data stream applications. Data streams are infinite and evolving over time, and we do not have any knowledge about the number of clusters. In a data stream environment due to various factors, some noise appears occasionally. Density-based method is a remarkable class in clustering data streams, which has the ability to discover arbitrary shape clusters and to detect noise. Furthermore, it does not need the number of clusters in advance. Due to data stream characteristics, the traditional density-based clustering is not applicable. Recently, a lot of density-based clustering algorithms are extended for data streams. The main idea in these algorithms is using density-based methods in the clustering process and at the same time overcoming the constraints, which are put out by data stream’s nature. The purpose of this paper is to shed light on some algorithms in the literature on density-based clustering over data streams. We not only summarize the main density-based clustering algorithms on data streams, discuss their uniqueness and limitations, but also explain how they address the challenges in clustering data streams. Moreover, we investigate the evaluation metrics used in validating cluster quality and measuring algorithms’ performance. It is hoped that this survey will serve as a steppingstone for researchers studying data streams clustering, particularly density-based algorithms.
Shapelet Based Two-Step Time Series Positive and Unlabeled Learning
Springer Science and Business Media LLC - Tập 38 - Trang 1387-1402 - 2023
In the last decade, there has been significant progress in time series classification. However, in real-world industrial settings, it is expensive and difficult to obtain high-quality labeled data. Therefore, the positive and unlabeled learning (PU-learning) problem has become more and more popular recently. The current PU-learning approaches of the time series data suffer from low accuracy due to the lack of negative labeled time series. In this paper, we propose a novel shapelet based two-step (2STEP) PU-learning approach. In the first step, we generate shapelet features based on the positive time series, which are used to select a set of negative examples. In the second step, based on both positive and negative time series, we select the final features and build the classification model. The experimental results show that our 2STEP approach can improve the average F1 score on 15 datasets by 9.1% compared with the baselines, and achieves the highest F1 score on 10 out of 15 time series datasets.
Managing Data-Objects in Dynamically Reconfigurable Caches
Springer Science and Business Media LLC - Tập 25 - Trang 232-245 - 2010
The widening gap between processor and memory speeds makes cache an important issue in the computer system design. Compared with work set of programs, cache resource is often rare. Therefore, it is very important for a computer system to use cache efficiently. Toward a dynamically reconfigurable cache proposed recently, DOOC (Data-Object Oriented Cache), this paper proposes a quantitative framework for analyzing the cache requirement of data-objects, which includes cache capacity, block size, associativity and coherence protocol. And a kind of graph coloring algorithm dealing with the competition between data-objects in the DOOC is proposed as well. Finally, we apply our approaches to the compiler management of DOOC. We test our approaches on both a single-core platform and a four-core platform. Compared with the traditional caches, the DOOC in both platforms achieves an average reduction of 44.98% and 49.69% in miss rate respectively. And its performance is very close to the ideal optimal cache.
An Experimental Study of Text Representation Methods for Cross-Site Purchase Preference Prediction Using the Social Text Data
Springer Science and Business Media LLC - - 2017
A radiosity solution for curved surface environments
Springer Science and Business Media LLC - Tập 12 - Trang 414-424 - 1997
Radiosity has been a popular method for photorealistic image generation. But the determination of form factors between curved patches is the most difficult and time consuming procedure, and also the errors caused by approximating source patch’s radiosity with average values are obvious. In this paper, a radiosity algorithm for rendering curved surfaces represented by parameters is described. The contributed radiosity from differential areas on four vertices of the source patch to a receiving point is calculated firstly, then the contribution from the inner area of the source patch is evaluated by interpolating the values on four corners. Both the difficult problem of determining form-factors between curved surfaces and errors mentioned above have been avoided. Comparison of the experimental results using the new algorithm has been made with the ones obtained by traditional method. Some associated techniques such as the visibility test and the adaptive subdivision are also described.
Continuous Outlier Monitoring on Uncertain Data Streams
Springer Science and Business Media LLC - Tập 29 - Trang 436-448 - 2014
Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the efficiency. Furthermore, we propose a pruning approach — Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.
Multi-Scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules
Springer Science and Business Media LLC - - 2005
Approximation Algorithm Based on Chain Implication for Constrained Minimum Vertex Covers in Bipartite Graphs
Springer Science and Business Media LLC - Tập 23 - Trang 763-768 - 2008
The constrained minimum vertex cover problem on bipartite graphs (the Min-CVCB problem) is an important NP-complete problem. This paper presents a polynomial time approximation algorithm for the problem based on the technique of chain implication. For any given constant ɛ > 0, if an instance of the Min-CVCB problem has a minimum vertex cover of size (k
u
, k
l
), our algorithm constructs a vertex cover of size (k
*
u
, k
*
l
), satisfying max {k
*
u
/k
u
, k
*
l
/k
l
} ≤ 1 + ɛ.
Shape grammars and shape rules
Springer Science and Business Media LLC - Tập 2 - Trang 124-132 - 1987
How to generate various graphics and how to produce patterns automatically according to the ideas, requirements and rules of the people (experts), are quite important problems in the computer aided design (CAD) and design automation (DA). This paper presents an approach, using shape grammars, on the basis of shape rules to design pictures, which is a preliminary practice of solving the problems artificially. SGIS2D-2D shape grammar and shape rule interactive system has been run on IBM3033 mainframe and IBM PC AT with graphic terminal in PASCAL. This practice has shown that rule—based generating graphics is a good design paradigm.
Tổng số: 1,957
- 1
- 2
- 3
- 4
- 5
- 6
- 10