Adaptive dissimilarity index for measuring time series proximity

Advances in Data Analysis and Classification - Tập 1 - Trang 5-21 - 2007
Ahlame Douzal Chouakria1, Panduranga Naidu Nagabhushan2
1TIMC-IMAG, Université Joseph Fourier Grenoble 1, Cedex, France
2Department of Studies in Computer Science, University of Mysore Manasagangothri, Mysore, India

Tóm tắt

The most widely used measures of time series proximity are the Euclidean distance and dynamic time warping. The latter can be derived from the distance introduced by Maurice Fréchet in 1906 to account for the proximity between curves. The major limitation of these proximity measures is that they are based on the closeness of the values regardless of the similarity w.r.t. the growth behavior of the time series. To alleviate this drawback we propose a new dissimilarity index, based on an automatic adaptive tuning function, to include both proximity measures w.r.t. values and w.r.t. behavior. A comparative numerical analysis between the proposed index and the classical distance measures is performed on the basis of two datasets: a synthetic dataset and a dataset from a public health study.

Tài liệu tham khảo

Alt H, Godau M (1992) Measuring the resemblance of polygonal curves. In: Proceedings of 8th Annual ACM Symposium on Computational Geometry. ACM Press, Berlin, pp 102–109 Caiado J, Crato N, Pena D (2006) A periodogram-based metric for time series classification. Comput Stat Data Anal 50:2668–2684 Chouakria Douzal A (2003). Compression technique preserving correlations of a multivariate temporal sequence. In: Berthold MR, Lenz HJ, Bradley E, Kruse R, Borgelt C (eds). Advances in Intelligent Data Analysis. Springer, Berlin Heidelberg, pp 566–577 Eiter T, Mannila H (1994) Computing discrete Fréchet distance. Technical report CD-TR 94/64, Christian Doppler Laboratory for expert systems. TU Vienna, Austria Fréchet M (1906) Sur quelques points du calcul fonctionnel. Rend Circ Math Palermo 22:1–74 Garcia-Escudero LA, Gordaliza A (2005) A proposal for robust curve clustering. J Classif 22:185-201 Godau M (1991) A natural metric for curves—computing the distance for polygonal chains and approximation algorithms. In: Proceedings of 8th Symposium Theoretical Aspects of Computation Science, Springer, Lecture notes in Computer Science, Springer-Verlag New York, pp 127–136 Heckman NE, Zamar RH (2000) Comparing the shapes of regression functions. Biometrika 22:135–144 Hennig C, Hausdorf B (2006). Design of dissimilarity measure: a new dissimilarity measure between species distribution ranges. In: Batagelj V, Bock HH, Ferligoj A, Z̆iberna A (eds). Data science and classification. Springer, Heidelberg Berlin, pp 29–38 Kakizawa Y, Shumway RH, Taniguchi N (1998) Discrimination and clustering for multivariate time series. J Am Stat Assoc 93(441): 328–340 Kaslow RA, Ostrow DG (1987) The multicenter AIDS cohort study: rational, organization and selected characteristics of the participants. Am J Epidemiol 126:310–18 Keller K, Wittfeld K (2004) Distances of time series components by means of symbolic dynamics. Int J Bifurc Chaos 14:693–704 Liao WT (2005) Clustering of time series data—a survey. Pattern Recognit 38:1857–1874 Maharaj EA (2000) Cluster of time series. J Classif 17:297–314 Moller-Levet CS, Klawonn F, Cho KH, Wolkenhauer O (2003). Fuzzy clustering of short time series and unevenly distributed sampling points. In: Berthold MR, Lenz HJ, Bradley E, Kruse R, Borgelt C (eds). Advances in Intelligent Data Analysis. Springer, Berlin Heidelberg, pp 330–340 Oates T, Firoiou L, Cohen PR (1999) Clustering time series with Hidden Markov Models and Dynamic Time Warping. In: Proceedings of 6th IJCAI-99, Workshop on Neural, Symbolic and Reinforcement Learning Methods for Sequence Learning. Stockholm, pp 17–21 Sankoff D, Kruskal JB ed. (1983) Time warps, string edits, and macromolecules: the theory and practice of sequence comparison. Addison-Wesley, Reading Serban N, Wasserman L (2004) CATS: cluster after transformation and smoothing. J Am Stat Assoc 100:990–999