Model-based clustering of high-dimensional data streams with online mixture of probabilistic PCA
Tóm tắt
Từ khóa
Tài liệu tham khảo
Aggarwal C, Han J, Wang J, Yu P (2004) A framework for projected clustering of high dimensional data streams. In: Proceedings of the 30th International Conference on very large data bases, vol. 30. VLDB Endowment, pp 852–863
Arandjelović O, Cipolla R (2005) Incremental learning of temporally-coherent Gaussian mixture models. In: Proceedings of the British Machine Vision Conference. Oxford, UK, pp 759–768
Babcock B, Datar M, Motwani R, O’Callaghan L (2003) Maintaining variance and k-medians over data stream windows. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on principles of database systems. ACM, pp 234–243
Baek J, McLachlan G, Flack L (2010) Mixtures of factor analyzers with common factor loadings: Applications to the clustering and visualization of high-dimensional data. Pattern Anal Mach Intell IEEE Trans 32(7):1298–1309
Bartholomew D, Knott M, Moustaki I (2011) Latent variable models and factor analysis: a unified approach, vol 899. Wiley, New York
Basilevsky A (2009) Statistical factor analysis and related methods: theory and applications, vol 418. Wiley-Interscience, New York
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. Pattern Anal Mach Intell IEEE Trans 22(7):719–725
Bouveyron C, Brunet C (2012) Simultaneous model-based clustering and visualization in the fisher discriminative subspace. Stat Comput 22(1):301–324
Bouveyron C, Girard S, Schmid C (2007a) High-dimensional data clustering. Comput Stat Data Anal 52(1):502–519
Bouveyron C, Girard S, Schmid C (2007b) High-dimensional discriminant analysis. Commun Stat Theory Methods 36(14):2607–2623
Cappé O, Moulines E (2009) Online EM algorithm for latent data models. R Stat Soc: Ser B (Stat Methodol) 71:1–21. http://arxiv.org/pdf/0712.4273
Celeux G, Govaert G (1992) A classification em algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315–332
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38. doi: 10.2307/2984875
Domingos P, Hulten G (2001) A general method for scaling up machine learning algorithms and its application to clustering. In: Proceedings of the 18th International Conference on Machine Learning, pp 106–113
Duda R, Har, P, Stork D (1995) Pattern classification and scene analysis, 2nd edn
Figueiredo M, Jain A (2002) Unsupervised learning of finite mixture models. Pattern Anal Mach Intell IEEE Trans 24(3):381–396
Fraley C, Raftery A (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631
Gaber M, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. ACM Sigmod Record 34(2):18–26
Ghahramani Z, Hinton G et al (1996) The em algorithm for mixtures of factor analyzers. Tech. rep., Technical Report CRG-TR-96-1, University of Toronto
Guha S, Mishra N, Motwani R, O’Callaghan L (2000) Clustering data streams. In: Foundations of Computer Science, 2000. In: Proceedings of 41st Annual Symposium on IEEE, pp 359–366
Hall P, Hicks Y, Robinson T (2005) A method to add gaussian mixture models. Technical report, University of Bath
Hall P, Marshall D, Martin R (1998) Incremental eigenanalysis for classification. In: British Machine Vision Conference, vol 1. Citeseer, pp 286–295
Jacques J, Bouveyron C, Girard S, Devos O, Duponchel L, Ruckebusch C (2010) Gaussian mixture models for the classification of high-dimensional vibrational spectroscopy data. J Chemom 24(11–12):719–727
Lindsay B (1995) Mixture models: theory, geometry and applications. In: JSTOR NSF-CBMS Regional Conference Series in probability and statistics.
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on mathematical statistics and probability, vol. 1. California, USA, p 14
McLachlan G, Krishnan T (1997) The em algorithm and extensions. Wiley-Interscience, New York
McLachlan G, Peel D, Bean R (2003) Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat Data Anal 41(3):379–388
McNicholas P, Murphy T, McDaid A, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Stat Data Anal 54(3):711–723
Neal R, Hinton G (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants. Learn Graph Models 89:355–368
O’callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2002) Streaming-data algorithms for high-quality clustering. In: Proceedings of 18th International Conference on Data Engineering, pp 685–694
Samé A, Ambroise C, Govaert G (2007) An online classification EM algorithm based on the mixture model. Stat Comput 17(3):209–218. doi: 10.1007/s11222-007-9017-z
Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101
Tipping M, Bishop C (1999) Mixtures of probabilistic principal component analyzers. Neural Comput 11(2):443–482
Titterington D (1984) Recursive parameter estimation using incomplete data. J R Stat Soc Ser B (Methodol) 46(2):257–267
Ueda N, Nakano R, Ghahramani Z, Hinton G (2000) Smem algorithm for mixture models. Neural Comput 12(9):2109–2128
Wang WL, Lin TI (2013) An efficient ecm algorithm for maximum likelihood estimation in mixtures of t-factor analyzers. Comput Stat 28(2):751–759