Elastic similarity and distance measures for multivariate time series

Knowledge and Information Systems - Tập 65 - Trang 2665-2698 - 2023
Ahmed Shifaz1, Charlotte Pelletier1,2, François Petitjean1, Geoffrey I. Webb1
1Department of Data Science and Artificial Intelligence, Monash University, Melbourne, Australia
2IRISA, UMR CNRS 6074, Université Bretagne Sud, Vannes, France

Tóm tắt

This paper contributes multivariate versions of seven commonly used elastic similarity and distance measures for time series data analytics. Elastic similarity and distance measures can compensate for misalignments in the time axis of time series data. We adapt two existing strategies used in a multivariate version of the well-known Dynamic Time Warping (DTW), namely, Independent and Dependent DTW, to these seven measures. While these measures can be applied to various time series analysis tasks, we demonstrate their utility on multivariate time series classification using the nearest neighbor classifier. On 23 well-known datasets, we demonstrate that each of the measures but one achieves the highest accuracy relative to others on at least one dataset, supporting the value of developing a suite of multivariate similarity and distance measures. We also demonstrate that there are datasets for which either the dependent versions of all measures are more accurate than their independent counterparts or vice versa. In addition, we also construct a nearest neighbor-based ensemble of the measures and show that it is competitive to other state-of-the-art single-strategy multivariate time series classifiers.

Tài liệu tham khảo

Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49 Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: Proceedings of AAAI workshop on knowledge discovery in databases, vol 10. Seattle, WA, USA, pp 359–370 Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering-a decade review. Inf Syst 53:16–38 Liao TW (2005) Clustering of time series data-a survey. Pattern Recognit 38(11):1857–1874 Lines J, Bagnall A (2015) Time series classification with ensembles of elastic distance measures. Data Min Knowl Disc 29(3):565–592 Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Disc 31(3):606–660 Izakian H, Pedrycz W (2014) Anomaly detection and characterization in spatial time series data: a cluster-centric approach. IEEE Trans Fuzzy Syst 22(6):1612–1624 Steiger M, Bernard J, Mittelstädt S, Lücke-Tieke H, Keim D, May T, Kohlhammer J (2014) Visual analysis of time-series similarities for anomaly detection in sensor networks. In: Computer graphics forum, vol 33. Wiley Online Library, pp 401–410 Gunopulos D, Das G (2001) Time series similarity measures and time series indexing. ACM SIGMOD Rec 30(2):624 Park S, Kim S-W, Chu WW (2001) Segment-based approach for subsequence searches in sequence databases. In: Proceedings of the 2001 ACM symposium on Applied computing, pp 248–252 Cassisi C, Montalto P, Aliotta M, Cannata A, Pulvirenti A (2012) Similarity measures and dimensionality reduction techniques for time series data mining. Advances in Data Mining Knowledge Discovery and Applications (InTech Rijeka, Croatia 2012), 71–96 Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552 Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Disc 7(4):349–371 Shokoohi-Yekta M, Hu B, Jin H, Wang J, Keogh E (2017) Generalizing DTW to the multi-dimensional case requires an adaptive approach. Data Min Knowl Disc 31(1):1–31 Lucas B, Shifaz A, Pelletier C, O’Neill L, Zaidi N, Goethals B, Petitjean F, Webb GI (2019) Proximity forest: an effective and scalable distance-based classifier for time series. Data Min Knowl Disc 33(3):607–635 Lines J, Taylor S, Bagnall A (2018) Time series classification with HIVE-COTE: the Hierarchical vote collective of transformation-based ensembles. ACM Trans Knowl Discovery Data, 12(5) Shifaz A, Pelletier C, Petitjean F, Webb GI (2020) TS-CHIEF: a scalable and accurate forest algorithm for time series classification. Data Min Knowl Disc 34(3):742–775 Keogh EJ, Pazzani MJ (2001) Derivative dynamic time warping. In: Proceedings of the 2001 SIAM international conference on data mining. SIAM, pp 1–11 Jeong Y-S, Jeong MK, Omitaomu OA (2011) Weighted dynamic time warping for time series classification. Pattern Recognit 44(9):2231–2240 Hirschberg DS (1977) Algorithms for the longest common subsequence problem. J ACM 24(4):664–675 Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings 18th international conference on data engineering. IEEE, pp 673–684 Chen L, Ng R (2004) On the marriage of lp-norms and edit distance. In: Proceedings of the thirtieth international conference on VLDB-volume 30, pp 792–803 Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, pp 491–502 Stefan A, Athitsos V, Das G (2012) The move-split-merge metric for time series. IEEE Trans Knowl Data Eng 25(6):1425–1438 Marteau P-F (2008) Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans Pattern Anal Mach Intell 31(2):306–318 Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process 23(1):67–72 Deng H, Runger G, Tuv E, Vladimir M (2013) A time series forest for classification and feature extraction. Inf Sci 239:142–153 Middlehurst M, Large J, Bagnall A (2020) The canonical interval forest (CIF) classifier for time series classification. In: 2020 IEEE international conference on big data (big data). IEEE, pp 188–195 Middlehurst M, Large J, Flynn M, Lines J, Bostrom A, Bagnall A (2021) HIVE-COTE 2.0: a new meta ensemble for time series classification. Mach Learn 110(11):3211–3243 Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Min Knowl Disc 28(4):851–881 Karlsson I, Papapetrou P, Boström H (2016) Generalized random shapelet forests. Data Min Knowl Disc 30(5):1053–1085 Bagnall A, Flynn M, Large J, Lines J, Middlehurst M (2020) On the usage and performance of the Hierarchical Vote Collective of Transformation-based Ensembles version 1.0 (HIVE-COTE v1. 0). In: International workshop on advanced analytics and learning on temporal data. Springer, Berlin, pp 3–18 Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315 Schäfer P (2015) The BOSS is concerned with time series classification in the presence of noise. Data Min Knowl Disc 29(6):1505–1530 Schäfer P, Leser U (2017) Fast and accurate time series classification with weasel. In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp 637–646 Middlehurst M, Large J, Cawley G, Bagnall A (2020) The temporal dictionary ensemble (TDE) classifier for time series classification. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp 660–676 Dempster A, Petitjean F, Webb GI (2020) ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Min Knowl Disc 34(5):1454–1495 Gallicchio C, Micheli A (2017) Deep echo state network (deepesn): a brief survey. arXiv preprintarXiv:1712.04323 Wang Z, Yan W, Oates T (2017) Time series classification from scratch with deep neural networks: a strong baseline. In: International joint conference on neural networks (IJCNN). IEEE 2017:1578–1585 Fawaz HI, Lucas B, Forestier G, Pelletier C, Schmidt DF, Weber J, Webb GI, Idoumghar L, Muller P-A, Petitjean F (2020) Inceptiontime: finding AlexNet for time series classification. Data Min Knowl Disc 34(6):1936–1962 Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller P-A (2019) Deep learning for time series classification: a review. Data Min Knowl Disc 33(4):917–963 Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021) The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Disc 35(2):401–449 Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018) The UEA multivariate time series classification archive, 2018. arXiv preprintarXiv:1811.00075 Middlehurst M, Vickers W, Bagnall A (2019) Scalable dictionary classifiers for time series classification. In: International conference on intelligent data engineering and automated learning. Springer, pp 11–19 Löning M, Bagnall A, Ganesh S, Kazakov V, Lines J, Király FJ (2019) Sktime: a unified interface for machine learning with time series. In: Workshop on systems for ML at NeurIPS 2019 Zhang X, Gao Y, Lin J, Lu C-T (2020) Tapnet: multivariate time series classification with attentional prototypical network. Proc AAAI Conf Artif Intell 34(04):6845–6852 Tan CW, Herrmann M, Forestier G, Webb GI, Petitjean F (2018) Efficient search of the best warping window for dynamic time warping. In: Proceedings of the 2018 SIAM international conference on data mining. SIAM, pp 225–233 Keogh E, Wei L, Xi X, Vlachos M, Lee S-H, Protopapas P (2009) Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures. VLDB J 18(3):611–630 Lemire D (2009) Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern Recognit 42(9):2169–2180 Tan CW, Webb GI, Petitjean F (2017) Indexing and classifying gigabytes of time series under time warping. In: Proceedings of the 2017 SIAM international conference on data mining. SIAM, pp 282–290 Herrmann M, Webb GI (2021) Early abandoning and pruning for elastic distances including dynamic time warping. Data Min Knowl Disc 35(6):2577–2601 Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh E (2003) Indexing multi-dimensional time-series with support for multiple distance measures. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 216–225 Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30 Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? J Mach Learn Res 17(1):152–161 Keogh E, Wei L, Xi X, Lee S-H, Vlachos M (2006) LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures. In: Proceedings of the 32nd international conference on very large databases. Citeseer, pp 882–893 Tan CW, Petitjean F, Webb GI (2019) Elastic bands across the path: a new framework and method to lower bound dtw. In: Proceedings of the 2019 SIAM international conference on data mining. SIAM, pp 522–530