Noise fuzzy clustering of time series by autoregressive metric
Tóm tắt
We propose a robust fuzzy clustering model for classifying time series, considering the autoregressive metric based. In particular, we suggest a clustering procedure which: 1) considers an autoregressive parameterization of the time series, capable of representing a large class of time series; 2) inherits the benefits of the partitioning around medoids approach, classifying time series in classes characterized by prototypal observed time series (the “medoid” time series), which synthesize the structural information of each cluster; 3) inherits the benefits of the fuzzy approach, capturing the vague (fuzzy) behaviour of particular time series, such as “middle” time series (time series with middle features in respect of the considered clusters in all time period) and “switching” time series (time series with a pattern typical of a given cluster during a certain time period and a completely different pattern, similar to another cluster, in another time period); 4) is capable of suitably neutralizing the negative influence of the presence of “outlier” time series in the clustering procedure, i.e., the “outlier” time series are classified in the so-called “noise cluster” and therefore cluster structure is not altered. To illustrate the effectiveness of the proposed model, a simulation study and an application to real time series are carried out.
Tài liệu tham khảo
Alonso, A.M., Maharaj, E.A.: Comparison of time series using sub-sampling. Comput. Stat. Data Anal. 50(10), 2589–2599 (2006)
Alonso, A.M., Berrendero, J.R., Hernández, A., Justel, A.: Time series clustering based on forecast densities. Comput. Stat. Data Anal. 51(2), 762–776 (2006)
Bellman, R.E.: Adaptive Control Processes. Princeton University Press, Princeton (1961)
Beni, G., Liu, X.: A least biased fuzzy clustering method. IEEE Trans. Pattern Recognit. Anal. Mach. Intell. 16, 954–960 (1994)
Berndt D, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: Proceedings of the AAAI’94 Workshop on Knowledge Discovery in Databases, pp. 229–248
Beyen K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is nearest neighbor meaningful? In: Proceedings of the 7th International Conference on Database Theory, pp. 217–235
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Box, G.E.P., Jenkins, G.: Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco (1976)
Caiado, J., Crato, N., Peña, D.: A periodogram-based metric for time series classification. Comput. Stat. Data Anal. 50(10), 2668–2684 (2006)
Church, R.: Contrasts betwee0n facility location approaches and non-hierarchical cluster analysis. In: ORSA/TIMS Joint National Meeting, Los Angeles (1978)
Cimino, M., Frosini, G., Lazzerini, B., Marcelloni, F.: On the noise distance in robust fuzzy c-means. In: Proceeding of World Academy of Science, Engineering and Technology, vol. 1, pp. 361–364 (2005)
Corduas, M., Piccolo, D.: Time series clustering and classification by the autoregressive metric. Comput. Stat. Data Anal. 52(4), 1860–1872 (2008)
Davé, R.N.: Characterization and detection of noise in clustering. Pattern Recognit. Lett. 12, 657–664 (1991)
Davé, R.N., Fu, T.: Robust shape detection using fuzzy clustering: practical applications. Fuzzy Sets Syst. 65, 161–185 (1994)
Davé, R.N., Krishnapuram, R.: Robust clustering methods: an unified view. IEEE Trans. Fuzzy Syst. 5, 270–293 (1997)
Davé, R.N., Sen, S.: Noise clustering algorithm revisited. In: Fuzzy Information Processing Society, 1997 Annual Meeting of the North American, NAFIPS’97, IEEE, pp. 199–204 (1997)
Davé, R.N., Sen, S.: Robust fuzzy clustering of relational data. IEEE Trans. Fuzzy Syst. 10(6), 713–727 (2002)
Dembélé, D., Kastner, P.: Fuzzy C-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)
Díaz, S.P., Vilar, J.A.: Comparing several parametric and nonparametric approaches to time series clustering: a simulation study. J. Classif. 27(3), 333–362 (2010)
D’Urso, P.: Dissimilarity measures for time trajectories. J. Ital. Stat. Soc. 9(1–3), 53–83 (2000)
D’Urso, P.: Fuzzy C-means clustering models for multivariate time-varying data: Different approaches. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 12(03), 287–326 (2004)
D’Urso, P.: Fuzzy clustering for data time arrays with inlier and outlier time trajectories. IEEE Trans. Fuzzy Syst. 13(5), 583–604 (2005)
D’Urso, P., Maharaj, E.A.: Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets Syst. 160(24), 3565–3589 (2009)
D’Urso, P., Di Lallo, D., Maharaj, E.A.: Autoregressive model-based fuzzy clustering and its application for detecting information redundancy in air pollution monitoring networks. Soft Comput. 17(1), 83–131 (2013)
Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold Press, London (2001)
Frigui, H., Krishnapuram, R.: A robust competitive clustering algorithm with applications in computer vision. IEEE Trans. Pattern Anal. Mach. Intell. 21(5), 450–465 (1999)
Fritz, H., García-Escudero, L.A., Mayo-Iscar, A.: Robust constrained fuzzy clustering. Inf. Sci. 245, 38–52 (2013)
García-Escudero, L.Á., Gordaliza, A.: Robustness properties of k means and trimmed k means. J. Am. Stat. Assoc. 94(447), 956–969 (1999)
García-Escudero, L.Á., Gordaliza, A.: A proposal for robust curve clustering. J. Classif. 22(2), 185–201 (2005)
García-Escudero, L.Á., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A review of robust clustering methods. Adv. Data Anal. Classif. 4(2–3), 89–109 (2010)
Grubbs, F.E.: Procedures for detecting outlying observations in samples. Technometrics 11(1), 1–21 (1969)
Heiser, W.J., Groenen, P.J.F.: Cluster differences scaling with a within-clusters loss component and a fuzzy successive approximation strategy to avoid local minima. Psychometrika 62(1), 63–83 (1997)
Huber, P.: Robust Stat. Wiley, New York (1981)
Hwang, H., Desarbo, W.S., Takane, Y.: Fuzzy clusterwise generalized structured component analysis. Psychometrika 72(2), 181–198 (2007)
Kamdar T, Joshi A (2000) On creating adaptive Web servers using weblog mining. Tech. Rep. TR-CS-00-05, Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore
Kaufman L, Rousseeuw PJ (1987) Clustering by means of medoids. In: Dodge Y (ed) Statistics Data Analysis based on the L1-Norm and Related Methods, North-Holland, pp 405–416
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Hoboken (1990)
Kim, J., Krishnapuram, R., Davé, R.: Application of the least trimmed squares technique to prototype-based clustering. Pattern Recognit. Lett. 17(6), 633–641 (1996)
Krishnapuram, R., Keller, J.M.: A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1(2), 98–110 (1993)
Krishnapuram, R., Keller, J.M.: The possibilistic c-means algorithm: insights and recommendations. IEEE Trans. Fuzzy Syst. 4(3), 385–393 (1996)
Krishnapuram, R., Joshi, A., Yi, L.: A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering. In: Fuzzy Systems Conference Proceedings, 1999. FUZZ-IEEE’99. 1999 IEEE International. IEEE, vol. 3, pp. 1281–1286 (1999)
Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Trans. Fuzzy Syst. 9(4), 595–607 (2001)
Kwon, S.H.: Cluster validity index for fuzzy clustering. Electron. Lett. 34(22), 2176–2177 (1998)
Liao, W.T.: Clustering of time series data: a survey. Pattern Recognit. 38(11), 1857–1874 (2005)
Maharaj, E.A.: A significance test for classifying ARMA models. J. Stat. Comput. Simul. 54(4), 305–331 (1996)
Maharaj, E.A.: Comparison and classification of stationary multivariate time series. Pattern Recognit. 32(7), 1129–1138 (1999)
Maharaj, E.A.: Cluster of time series. J. Classif. 17(2), 297–314 (2000)
Maharaj, E.A.: Comparison of non-stationary time series in the frequency domain. Comput. Stat. Data Anal. 40(1), 131–141 (2002)
Maharaj, E.A., D’Urso, P.: Fuzzy clustering of time series in the frequency domain. Inf. Sci. 181(7), 1187–1211 (2011)
Maharaj, E.A., D’Urso, P., Galagedera, D.: Wavelets-based fuzzy clustering of time series. J. Classif. 27, 231–275 (2010)
McBratney, A.B., Moore, A.W.: Application of fuzzy sets to climatic classification. Agric. For. Meteorol. 35(1), 165–185 (1985)
Mulvey, J.M., Crowder, H.P.: Cluster analysis: an application of Lagrangian relaxation. Manag. Sci. 25(4), 329–340 (1979)
Ohashi, Y.: Fuzzy clustering and robust estimation. In: 9th Meeting SAS Users Group Int., Holliwood Beach (1984)
Piccolo, D.: A distance measure for classifying ARIMA models. J. Time Ser. Anal. 11(2), 153–164 (1990)
Rao, M.R.: Cluster analysis and mathematical programming. J. Am. Stat. Assoc. 66(335), 622–626 (1971)
Runkler, T.A., Bezdek, J.C.: ACE: a tool for clustering and rule extraction. IEEE Trans. Fuzzy Syst. 5, 270–293 (1999)
Tarpey, T., Kinateder, K.K.J.: Clustering functional data. J. Classif. 20(1), 093–114 (2003)
Vilar, J.A., Alonso, A.M., Vilar, J.M.: Non-linear time series clustering based on non-parametric forecast densities. Comput. Stat. Data Anal. 54(11), 2850–2865 (2010)
Vinod, H.D.: Integer programming and the theory of grouping. J. Am. Stat. Assoc. 64(326), 506–519 (1969)
Wang, N., Blostein, S.D.: Adaptive zero-padding OFDM over frequency-selective multipath channels. EURASIP J. Adv. Signal Process. 10, 1478–1488 (2004)
Wedel, M., Kamakura, W.A.: Market Segmentation: Conceptual and Methodological Foundations. Kluwer Academic Publishers, Boston (2000)
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(8), 841–847 (1991)
Zeng, Y., Garcia-Frias, J.: A novel HMM-based clustering algorithm for the analysis of gene expression time-course data. Comput. Stat. Data Anal. 50(9), 2472–2494 (2006)