Model-based clustering of high-dimensional data streams with online mixture of probabilistic PCA

Advances in Data Analysis and Classification - Tập 7 Số 3 - Trang 281-300 - 2013
Anastasios Bellas1, Charles Bouveyron1, Marie Cottrell1, Jérôme Lacaille2
1SAMM (EA 4543), Université Paris 1, Paris Cedex 13, France
2Snecma, Groupe Safran, Moissy Cramayel, France

Tóm tắt

Từ khóa


Tài liệu tham khảo

Aggarwal C, Han J, Wang J, Yu P (2004) A framework for projected clustering of high dimensional data streams. In: Proceedings of the 30th International Conference on very large data bases, vol. 30. VLDB Endowment, pp 852–863

Akaike H (1981) Likelihood of a model and information criteria. J Econom 16(1):3–14

Arandjelović O, Cipolla R (2005) Incremental learning of temporally-coherent Gaussian mixture models. In: Proceedings of the British Machine Vision Conference. Oxford, UK, pp 759–768

Babcock B, Datar M, Motwani R, O’Callaghan L (2003) Maintaining variance and k-medians over data stream windows. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on principles of database systems. ACM, pp 234–243

Baek J, McLachlan G, Flack L (2010) Mixtures of factor analyzers with common factor loadings: Applications to the clustering and visualization of high-dimensional data. Pattern Anal Mach Intell IEEE Trans 32(7):1298–1309

Bartholomew D, Knott M, Moustaki I (2011) Latent variable models and factor analysis: a unified approach, vol 899. Wiley, New York

Basilevsky A (2009) Statistical factor analysis and related methods: theory and applications, vol 418. Wiley-Interscience, New York

Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. Pattern Anal Mach Intell IEEE Trans 22(7):719–725

Bouveyron C, Brunet C (2012) Simultaneous model-based clustering and visualization in the fisher discriminative subspace. Stat Comput 22(1):301–324

Bouveyron C, Girard S, Schmid C (2007a) High-dimensional data clustering. Comput Stat Data Anal 52(1):502–519

Bouveyron C, Girard S, Schmid C (2007b) High-dimensional discriminant analysis. Commun Stat Theory Methods 36(14):2607–2623

Cappé O, Moulines E (2009) Online EM algorithm for latent data models. R Stat Soc: Ser B (Stat Methodol) 71:1–21. http://arxiv.org/pdf/0712.4273

Celeux G, Govaert G (1992) A classification em algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315–332

Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38. doi: 10.2307/2984875

Domingos P, Hulten G (2001) A general method for scaling up machine learning algorithms and its application to clustering. In: Proceedings of the 18th International Conference on Machine Learning, pp 106–113

Duda R, Har, P, Stork D (1995) Pattern classification and scene analysis, 2nd edn

Figueiredo M, Jain A (2002) Unsupervised learning of finite mixture models. Pattern Anal Mach Intell IEEE Trans 24(3):381–396

Fraley C, Raftery A (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631

Gaber M, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. ACM Sigmod Record 34(2):18–26

Ghahramani Z, Hinton G et al (1996) The em algorithm for mixtures of factor analyzers. Tech. rep., Technical Report CRG-TR-96-1, University of Toronto

Guha S, Mishra N, Motwani R, O’Callaghan L (2000) Clustering data streams. In: Foundations of Computer Science, 2000. In: Proceedings of 41st Annual Symposium on IEEE, pp 359–366

Hall P, Hicks Y, Robinson T (2005) A method to add gaussian mixture models. Technical report, University of Bath

Hall P, Marshall D, Martin R (1998) Incremental eigenanalysis for classification. In: British Machine Vision Conference, vol 1. Citeseer, pp 286–295

Jacques J, Bouveyron C, Girard S, Devos O, Duponchel L, Ruckebusch C (2010) Gaussian mixture models for the classification of high-dimensional vibrational spectroscopy data. J Chemom 24(11–12):719–727

Lindsay B (1995) Mixture models: theory, geometry and applications. In: JSTOR NSF-CBMS Regional Conference Series in probability and statistics.

MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on mathematical statistics and probability, vol. 1. California, USA, p 14

McLachlan G, Krishnan T (1997) The em algorithm and extensions. Wiley-Interscience, New York

McLachlan G, Peel D (2000) Finite mixture models, vol 299. Wiley-Interscience, New York

McLachlan G, Peel D, Bean R (2003) Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat Data Anal 41(3):379–388

McNicholas P, Murphy B (2008) Parsimonious Gaussian mixture models. Stat Comput 18(3):285–296

McNicholas P, Murphy T, McDaid A, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Stat Data Anal 54(3):711–723

Neal R, Hinton G (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants. Learn Graph Models 89:355–368

O’callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2002) Streaming-data algorithms for high-quality clustering. In: Proceedings of 18th International Conference on Data Engineering, pp 685–694

Samé A, Ambroise C, Govaert G (2007) An online classification EM algorithm based on the mixture model. Stat Comput 17(3):209–218. doi: 10.1007/s11222-007-9017-z

Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101

Tipping M, Bishop C (1999) Mixtures of probabilistic principal component analyzers. Neural Comput 11(2):443–482

Titterington D (1984) Recursive parameter estimation using incomplete data. J R Stat Soc Ser B (Methodol) 46(2):257–267

Ueda N, Nakano R, Ghahramani Z, Hinton G (2000) Smem algorithm for mixture models. Neural Comput 12(9):2109–2128

Wang WL, Lin TI (2013) An efficient ecm algorithm for maximum likelihood estimation in mixtures of t-factor analyzers. Comput Stat 28(2):751–759

Wu C (1983) On the convergence properties of the em algorithm. Ann Stat 11(1):95–103

Zhao JH, Yu PL (2008) Fast ml estimation for the mixture of factor analyzers via an ecm algorithm. Neural Netw IEEE Trans 19(11):1956–1961