Adaptive dissimilarity index for measuring time series proximity
Tóm tắt
The most widely used measures of time series proximity are the Euclidean distance and dynamic time warping. The latter can be derived from the distance introduced by Maurice Fréchet in 1906 to account for the proximity between curves. The major limitation of these proximity measures is that they are based on the closeness of the values regardless of the similarity w.r.t. the growth behavior of the time series. To alleviate this drawback we propose a new dissimilarity index, based on an automatic adaptive tuning function, to include both proximity measures w.r.t. values and w.r.t. behavior. A comparative numerical analysis between the proposed index and the classical distance measures is performed on the basis of two datasets: a synthetic dataset and a dataset from a public health study.
Tài liệu tham khảo
Alt H, Godau M (1992) Measuring the resemblance of polygonal curves. In: Proceedings of 8th Annual ACM Symposium on Computational Geometry. ACM Press, Berlin, pp 102–109
Caiado J, Crato N, Pena D (2006) A periodogram-based metric for time series classification. Comput Stat Data Anal 50:2668–2684
Chouakria Douzal A (2003). Compression technique preserving correlations of a multivariate temporal sequence. In: Berthold MR, Lenz HJ, Bradley E, Kruse R, Borgelt C (eds). Advances in Intelligent Data Analysis. Springer, Berlin Heidelberg, pp 566–577
Eiter T, Mannila H (1994) Computing discrete Fréchet distance. Technical report CD-TR 94/64, Christian Doppler Laboratory for expert systems. TU Vienna, Austria
Fréchet M (1906) Sur quelques points du calcul fonctionnel. Rend Circ Math Palermo 22:1–74
Garcia-Escudero LA, Gordaliza A (2005) A proposal for robust curve clustering. J Classif 22:185-201
Godau M (1991) A natural metric for curves—computing the distance for polygonal chains and approximation algorithms. In: Proceedings of 8th Symposium Theoretical Aspects of Computation Science, Springer, Lecture notes in Computer Science, Springer-Verlag New York, pp 127–136
Heckman NE, Zamar RH (2000) Comparing the shapes of regression functions. Biometrika 22:135–144
Hennig C, Hausdorf B (2006). Design of dissimilarity measure: a new dissimilarity measure between species distribution ranges. In: Batagelj V, Bock HH, Ferligoj A, Z̆iberna A (eds). Data science and classification. Springer, Heidelberg Berlin, pp 29–38
Kakizawa Y, Shumway RH, Taniguchi N (1998) Discrimination and clustering for multivariate time series. J Am Stat Assoc 93(441): 328–340
Kaslow RA, Ostrow DG (1987) The multicenter AIDS cohort study: rational, organization and selected characteristics of the participants. Am J Epidemiol 126:310–18
Keller K, Wittfeld K (2004) Distances of time series components by means of symbolic dynamics. Int J Bifurc Chaos 14:693–704
Liao WT (2005) Clustering of time series data—a survey. Pattern Recognit 38:1857–1874
Maharaj EA (2000) Cluster of time series. J Classif 17:297–314
Moller-Levet CS, Klawonn F, Cho KH, Wolkenhauer O (2003). Fuzzy clustering of short time series and unevenly distributed sampling points. In: Berthold MR, Lenz HJ, Bradley E, Kruse R, Borgelt C (eds). Advances in Intelligent Data Analysis. Springer, Berlin Heidelberg, pp 330–340
Oates T, Firoiou L, Cohen PR (1999) Clustering time series with Hidden Markov Models and Dynamic Time Warping. In: Proceedings of 6th IJCAI-99, Workshop on Neural, Symbolic and Reinforcement Learning Methods for Sequence Learning. Stockholm, pp 17–21
Sankoff D, Kruskal JB ed. (1983) Time warps, string edits, and macromolecules: the theory and practice of sequence comparison. Addison-Wesley, Reading
Serban N, Wasserman L (2004) CATS: cluster after transformation and smoothing. J Am Stat Assoc 100:990–999