Functional data clustering: a survey

Advances in Data Analysis and Classification - Tập 8 - Trang 231-255 - 2013
Julien Jacques1, Cristian Preda1
1Laboratoire Paul Painlevé, UMR CNRS 8524, Université Lille 1 and Inria Lille-Nord Europe, Villeneuve d’Ascq Cédex, France

Tóm tắt

Clustering techniques for functional data are reviewed. Four groups of clustering algorithms for functional data are proposed. The first group consists of methods working directly on the evaluation points of the curves. The second groups is defined by filtering methods which first approximate the curves into a finite basis of functions and second perform clustering using the basis expansion coefficients. The third groups is composed of methods which perform simultaneously dimensionality reduction of the curves and clustering, leading to functional representation of data depending on clusters. The last group consists of distance-based methods using clustering algorithms based on specific distances for functional data. A software review as well as an illustration of the application of these algorithms on real data are presented.

Tài liệu tham khảo

Abraham C, Cornillon PA, Matzner-Løber E, Molinari N (2003) Unsupervised curve clustering using B-splines. Scand J Stat Theory Appl 30(3):581–595. doi:10.1111/1467-9469.00350 Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716–723 (system identification and time-series analysis) Antoniadis A, Beder JH (1989) Joint estimation of the mean and the covariance of a Banach valued Gaussian vector. Statistics 20(1):77–93 Banfield J, Raftery A (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821 Bergé L, Bouveyron C, Girard S (2012) HDclassif : an R package for model-based clustering and discriminant analysis of high-dimensional data. J Stat Softw 42(6):1–29 Besse P (1979) Etude descriptive d’un processus. Thèse de doctorat \(3^{\grave{{\rm e}}{\rm me}}\) cycle Université Paul Sabatier, Toulouse Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the inegrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(4):719–725 Bosq D (2000) Linear processes in function spaces, Lecture Notes in Statistics, vol 149. Springer, New York (theory and applications) Boullé M (2012) Functional data clustering via piecewise constant nonparametric density estimation. Pattern Recognit 45(12):4389–4401 Boumaza R (1980) Contribution a l’étude descriptive d’une fonction aléatoire qualitative. PhD thesis, Université Paul Sabatier, Toulouse, France Bouveyron C, Brunet C (2013) Model-based clustering of high-dimensional data : a review. Technical report Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5(4):281–300 Bouveyron C, Girard S, Schmid C (2007) High dimensional data clustering. Comput Stat Data Anal 52: 502–519 Cardot H, Ferraty F, Sarda P (1999) Functional linear model. Stat Probab Lett 45:11–22 Cattell R (1966) The scree test for the number of factors. Multivar Behav Res 1(2):245–276 Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. J Pattern Recognit Soc 28:781–793 Chiou JM, Li PL (2007) Functional clustering and identifying substructures of longitudinal data. J R Stat Soc Ser B Stat Methodol 69(4):679–699. doi:10.1111/j.1467-9868.2007.00605.x Coifman R, Wickerhauser M (1992) Entropy-based algorithms for best basis selection. IEEE Trans Inf Theory 38(2):713–718 Cox T, Cox M (2001) Multidimensional scaling. Chapman and Hall, New York Cuesta-Albertos J, Fraiman R (2000) Impartial trimmed k-means for functional data. Comput Stat Data Anal 51:4864–4877 Dauxois J, Pousse A, Romain Y (1982) Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference. J Multivar Anal 12(1):136–154. doi:10.1016/0047-259X(82)90088-4 Delaigle A, Hall P (2010) Defining probability density for a distribution of random functions. Ann Stat 38:1171–1193 Deville J (1974) Méthodes statistiques et numériques de l’analyse harmonique. Annales de l’INSEE 15:3–101 Escabias M, Aguilera A, Valderrama M (2005) Modeling environmental data by functional principal component logistic regression. Environmetrics 16:95–107 Ferraty F, Vieu P (2006) Nonparametric functional data analysis, Springer Series in Statistics. Springer, New York Gaffney S (2004) Probabilistic curve-aligned clustering and prediction with mixture models. PhD thesis, Department of Computer Science, University of California, Irvine, USA Giacofci M, Lambert-Lacroix S, Marot G, Picard F (2012) Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics (in press) Guyon I, Von Luxburg U, Williamson R (2009) Clustering: science or art. In: NIPS 2009 workshop on clustering theory Hartigan J, Wong M (1978) Algorithm as 1326: a k-means clustering algorithm. Appl Stat 28:100–108 Heard N, Holmes C, Stephens D (2006) A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: an application of Bayesian hierarchical clustering of curves. J Am Stat Assoc 101(473):18–29. doi:10.1198/016214505000000187 Hébrail G, Hugueney B, Lechevallier Y, Rossi F (2010) Exploratory analysis of functional data via clustering and optimal segmentation. Neurocomput EEG Neurocomput 73(7–9):1125–1141 Ieva F, Paganoni A, Pigoli D, Vitelli V (2012) Multivariate functional clustering for the analysis of ecg curves morphology. J R Stat Soc Ser C Appl Stat (in press) Jacques J, Preda C (2013a) Funclust: a curves clustering method using functional random variable density approximation. Neurocomputing. doi:10.1016/j.neucom.2012.11.042 Jacques J, Preda C (2013b) Model-based clustering for multivariate functional data. Comput Stat Data Anal. doi:10.1016/j.csda.2012.12.004 James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462):397–408 Karhunen K (1947) Über lineare Methoden in der Wahrscheinlichkeitsrechnung. Ann Acad Sci Fennicae Ser A I Math-Phys 1947(37):79 Kayano M, Dozono K, Konishi S (2010) Functional cluster analysis via orthonormalized gaussian basis expansions and its application. J Classif 27:211–230 Kohonen T (1995) Self-organizing maps. Springer, New York Lévéder C, Abraham P, Cornillon E, Matzner-Lober E, Molinari N (2004) Discrimination de courbes de prétrissage. In: Chimiométrie 2004, Paris, pp 37–43 Liu X, Yang M (2009) Simultaneous curve registration and clustering for functional data. Comput Stat Data Anal 53:1361–1376 Loève M (1945) Fonctions aléatoires de second ordre. C R Acad Sci Paris 220:469 MATLAB (2010) version 7.10.0 (R2010a) The MathWorks Inc., Natick, Massachusetts McLachlan G, Peel D (2000) Finite mixture models. Wiley Series in Probability and Statistics. Applied Probability and Statistics, Wiley-Interscience, New York. doi:10.1002/0471721182 Olszewski R (2001) Generalized feature extraction for structural pattern recognition in time-series data. PhD thesis, Carnegie Mellon University, Pittsburgh, PA Peng J, Müller HG (2008) Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. Ann Appl Stat 2(3):1056–1077. doi:10.1214/08-AOAS172 Preda C, Saporta G, Lévéder C (2007) PLS classification of functional data. Comput Stat 22(2):223–235 R Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/. ISBN: 3-900051-07-0 Ramsay JO, Silverman BW (2002) Applied functional data analysis. Springer Series in Statistics. Springer, New York (methods and case studies) Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer Series in Statistics. Springer, New York Ray S, Mallick B (2006) Functional clustering by Bayesian wavelet methods. J R Stat Soc Ser B Stat Methodol 68(2):305–332. doi:10.1111/j.1467-9868.2006.00545.x Romano E, Giraldo R, Mateu J (2011) Recent advances in functional data analysis and related topics, Springer, chap clustering spatially correlated functional data Rossi F, Conan-Guez B, El Golli A (2004) Clustering functional data with the som algorithm. In: Proceedings of ESANN 2004. Bruges, Belgium, pp 305–312 Saito N, Coifman R (1995) Local discriminant bases and thier applications. J Math Imaging Vis 5(4):337–358 Samé A, Chamroukhi F, Govaert G, Aknin P (2011) Model-based clustering and segmentation of times series with changes in regime. Adv Data Anal Classif 5(4):301–322 Sangalli L, Secchi P, Vantini S, Vitelli V (2010a) Functional clustering and alignment methods with applications. Commun App Ind Math 1(1):205–224 Sangalli L, Secchi P, Vantini S, Vitelli V (2010b) \(k\)-mean alignment for curve clustering. Comput Stat Data Anal 54(5):1219–1233 Saporta G (1981) Méthodes exploratoires d’analyse de données temporelles. Cahiers du BURO 37–38 Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464 Secchi P, Vantini S, Vitelli V (2011) Recent advances in functional data analysis and related topics, Springer, chap Spatial Clustering of Functional Data Serban N, Jiang H (2012) Multilevel functional clustering analysis. Biometrics 68(3):805–814 Slaets L, Claeskens G, Hubert M (2012) Phase and amplitude-based clustering for functional data. Comput Stat Data Anal 56(7):2360–2374 Sugar C, James G (2003) Finding the number of clusters in a dataset: an information-theoretic approach. J Am Stat Assoc 98(463):750–763 Tarpey T, Kinateder K (2003) Clustering functional data. J Classif 20(1):93–114 Tipping ME, Bishop C (1999) Mixtures of principal component analyzers. Neural Comput 11(2):443–482 Tokushige S, Yadohisa H, Inada K (2007) Crisp and fuzzy k-means clustering algorithms for multivariate functional data. Comput Stat 22:1–16 Tuddenham R, Snyder M (1954) Physical growth of california boys and girls from birth to eighteen years. Univ Calif Public Child Dev 1:188–364 Wahba G (1990) Spline models for observational data. SIAM, Philadelphia Ward J, Joe H (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244 Yamamoto M (2012) Clustering of functional data in a low-dimensional subspace. Adv Data Anal Classif 6:219–247