Functional data clustering: a survey
Tóm tắt
Clustering techniques for functional data are reviewed. Four groups of clustering algorithms for functional data are proposed. The first group consists of methods working directly on the evaluation points of the curves. The second groups is defined by filtering methods which first approximate the curves into a finite basis of functions and second perform clustering using the basis expansion coefficients. The third groups is composed of methods which perform simultaneously dimensionality reduction of the curves and clustering, leading to functional representation of data depending on clusters. The last group consists of distance-based methods using clustering algorithms based on specific distances for functional data. A software review as well as an illustration of the application of these algorithms on real data are presented.
Tài liệu tham khảo
Abraham C, Cornillon PA, Matzner-Løber E, Molinari N (2003) Unsupervised curve clustering using B-splines. Scand J Stat Theory Appl 30(3):581–595. doi:10.1111/1467-9469.00350
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716–723 (system identification and time-series analysis)
Antoniadis A, Beder JH (1989) Joint estimation of the mean and the covariance of a Banach valued Gaussian vector. Statistics 20(1):77–93
Banfield J, Raftery A (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821
Bergé L, Bouveyron C, Girard S (2012) HDclassif : an R package for model-based clustering and discriminant analysis of high-dimensional data. J Stat Softw 42(6):1–29
Besse P (1979) Etude descriptive d’un processus. Thèse de doctorat \(3^{\grave{{\rm e}}{\rm me}}\) cycle Université Paul Sabatier, Toulouse
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the inegrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(4):719–725
Bosq D (2000) Linear processes in function spaces, Lecture Notes in Statistics, vol 149. Springer, New York (theory and applications)
Boullé M (2012) Functional data clustering via piecewise constant nonparametric density estimation. Pattern Recognit 45(12):4389–4401
Boumaza R (1980) Contribution a l’étude descriptive d’une fonction aléatoire qualitative. PhD thesis, Université Paul Sabatier, Toulouse, France
Bouveyron C, Brunet C (2013) Model-based clustering of high-dimensional data : a review. Technical report
Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5(4):281–300
Bouveyron C, Girard S, Schmid C (2007) High dimensional data clustering. Comput Stat Data Anal 52: 502–519
Cardot H, Ferraty F, Sarda P (1999) Functional linear model. Stat Probab Lett 45:11–22
Cattell R (1966) The scree test for the number of factors. Multivar Behav Res 1(2):245–276
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. J Pattern Recognit Soc 28:781–793
Chiou JM, Li PL (2007) Functional clustering and identifying substructures of longitudinal data. J R Stat Soc Ser B Stat Methodol 69(4):679–699. doi:10.1111/j.1467-9868.2007.00605.x
Coifman R, Wickerhauser M (1992) Entropy-based algorithms for best basis selection. IEEE Trans Inf Theory 38(2):713–718
Cox T, Cox M (2001) Multidimensional scaling. Chapman and Hall, New York
Cuesta-Albertos J, Fraiman R (2000) Impartial trimmed k-means for functional data. Comput Stat Data Anal 51:4864–4877
Dauxois J, Pousse A, Romain Y (1982) Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference. J Multivar Anal 12(1):136–154. doi:10.1016/0047-259X(82)90088-4
Delaigle A, Hall P (2010) Defining probability density for a distribution of random functions. Ann Stat 38:1171–1193
Deville J (1974) Méthodes statistiques et numériques de l’analyse harmonique. Annales de l’INSEE 15:3–101
Escabias M, Aguilera A, Valderrama M (2005) Modeling environmental data by functional principal component logistic regression. Environmetrics 16:95–107
Ferraty F, Vieu P (2006) Nonparametric functional data analysis, Springer Series in Statistics. Springer, New York
Gaffney S (2004) Probabilistic curve-aligned clustering and prediction with mixture models. PhD thesis, Department of Computer Science, University of California, Irvine, USA
Giacofci M, Lambert-Lacroix S, Marot G, Picard F (2012) Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics (in press)
Guyon I, Von Luxburg U, Williamson R (2009) Clustering: science or art. In: NIPS 2009 workshop on clustering theory
Hartigan J, Wong M (1978) Algorithm as 1326: a k-means clustering algorithm. Appl Stat 28:100–108
Heard N, Holmes C, Stephens D (2006) A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: an application of Bayesian hierarchical clustering of curves. J Am Stat Assoc 101(473):18–29. doi:10.1198/016214505000000187
Hébrail G, Hugueney B, Lechevallier Y, Rossi F (2010) Exploratory analysis of functional data via clustering and optimal segmentation. Neurocomput EEG Neurocomput 73(7–9):1125–1141
Ieva F, Paganoni A, Pigoli D, Vitelli V (2012) Multivariate functional clustering for the analysis of ecg curves morphology. J R Stat Soc Ser C Appl Stat (in press)
Jacques J, Preda C (2013a) Funclust: a curves clustering method using functional random variable density approximation. Neurocomputing. doi:10.1016/j.neucom.2012.11.042
Jacques J, Preda C (2013b) Model-based clustering for multivariate functional data. Comput Stat Data Anal. doi:10.1016/j.csda.2012.12.004
James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462):397–408
Karhunen K (1947) Über lineare Methoden in der Wahrscheinlichkeitsrechnung. Ann Acad Sci Fennicae Ser A I Math-Phys 1947(37):79
Kayano M, Dozono K, Konishi S (2010) Functional cluster analysis via orthonormalized gaussian basis expansions and its application. J Classif 27:211–230
Kohonen T (1995) Self-organizing maps. Springer, New York
Lévéder C, Abraham P, Cornillon E, Matzner-Lober E, Molinari N (2004) Discrimination de courbes de prétrissage. In: Chimiométrie 2004, Paris, pp 37–43
Liu X, Yang M (2009) Simultaneous curve registration and clustering for functional data. Comput Stat Data Anal 53:1361–1376
Loève M (1945) Fonctions aléatoires de second ordre. C R Acad Sci Paris 220:469
MATLAB (2010) version 7.10.0 (R2010a) The MathWorks Inc., Natick, Massachusetts
McLachlan G, Peel D (2000) Finite mixture models. Wiley Series in Probability and Statistics. Applied Probability and Statistics, Wiley-Interscience, New York. doi:10.1002/0471721182
Olszewski R (2001) Generalized feature extraction for structural pattern recognition in time-series data. PhD thesis, Carnegie Mellon University, Pittsburgh, PA
Peng J, Müller HG (2008) Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. Ann Appl Stat 2(3):1056–1077. doi:10.1214/08-AOAS172
Preda C, Saporta G, Lévéder C (2007) PLS classification of functional data. Comput Stat 22(2):223–235
R Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/. ISBN: 3-900051-07-0
Ramsay JO, Silverman BW (2002) Applied functional data analysis. Springer Series in Statistics. Springer, New York (methods and case studies)
Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer Series in Statistics. Springer, New York
Ray S, Mallick B (2006) Functional clustering by Bayesian wavelet methods. J R Stat Soc Ser B Stat Methodol 68(2):305–332. doi:10.1111/j.1467-9868.2006.00545.x
Romano E, Giraldo R, Mateu J (2011) Recent advances in functional data analysis and related topics, Springer, chap clustering spatially correlated functional data
Rossi F, Conan-Guez B, El Golli A (2004) Clustering functional data with the som algorithm. In: Proceedings of ESANN 2004. Bruges, Belgium, pp 305–312
Saito N, Coifman R (1995) Local discriminant bases and thier applications. J Math Imaging Vis 5(4):337–358
Samé A, Chamroukhi F, Govaert G, Aknin P (2011) Model-based clustering and segmentation of times series with changes in regime. Adv Data Anal Classif 5(4):301–322
Sangalli L, Secchi P, Vantini S, Vitelli V (2010a) Functional clustering and alignment methods with applications. Commun App Ind Math 1(1):205–224
Sangalli L, Secchi P, Vantini S, Vitelli V (2010b) \(k\)-mean alignment for curve clustering. Comput Stat Data Anal 54(5):1219–1233
Saporta G (1981) Méthodes exploratoires d’analyse de données temporelles. Cahiers du BURO 37–38
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Secchi P, Vantini S, Vitelli V (2011) Recent advances in functional data analysis and related topics, Springer, chap Spatial Clustering of Functional Data
Serban N, Jiang H (2012) Multilevel functional clustering analysis. Biometrics 68(3):805–814
Slaets L, Claeskens G, Hubert M (2012) Phase and amplitude-based clustering for functional data. Comput Stat Data Anal 56(7):2360–2374
Sugar C, James G (2003) Finding the number of clusters in a dataset: an information-theoretic approach. J Am Stat Assoc 98(463):750–763
Tarpey T, Kinateder K (2003) Clustering functional data. J Classif 20(1):93–114
Tipping ME, Bishop C (1999) Mixtures of principal component analyzers. Neural Comput 11(2):443–482
Tokushige S, Yadohisa H, Inada K (2007) Crisp and fuzzy k-means clustering algorithms for multivariate functional data. Comput Stat 22:1–16
Tuddenham R, Snyder M (1954) Physical growth of california boys and girls from birth to eighteen years. Univ Calif Public Child Dev 1:188–364
Wahba G (1990) Spline models for observational data. SIAM, Philadelphia
Ward J, Joe H (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244
Yamamoto M (2012) Clustering of functional data in a low-dimensional subspace. Adv Data Anal Classif 6:219–247