A parametric k-means algorithm

Computational Statistics - Tập 22 - Trang 71-89 - 2007
Thaddeus Tarpey1
1Department of Mathematics and Statistics, Wright State University, Dayton, USA

Tóm tắt

The k points that optimally represent a distribution (usually in terms of a squared error loss) are called the k principal points. This paper presents a computationally intensive method that automatically determines the principal points of a parametric distribution. Cluster means from the k-means algorithm are nonparametric estimators of principal points. A parametric k-means approach is introduced for estimating principal points by running the k-means algorithm on a very large simulated data set from a distribution whose parameters are estimated using maximum likelihood. Theoretical and simulation results are presented comparing the parametric k-means algorithm to the usual k-means algorithm and an example on determining sizes of gas masks is used to illustrate the parametric k-means algorithm.

Tài liệu tham khảo

Cox DR (1957) A note on grouping. J Am Stat Assoc 52:543–547

Flury B (1990) Principal points. Biometrika 77:33–41

Hartigan JA, Wong MA (1979) A k-means clustering algorithm. Appl Stat 28:100–108

Iyengar S, Solomon H (1983) Selecting representative points in normal populations. In recent advances in statistics: papers in honor of Herman chernoff on his 60th Birthday, Academic, New York, pp 579–591

Mease D, Nair VN, Sudjianto A (2004) Selective assembly in manufacturing: statistical issues and optimal binning strategies. Technometrics 46:165–175

Ramsay JO, Silverman BW (1997) Functional data analysis. Springer, New York

Stampfer E, Stadlober E (2002) Methods for estimating principal points. Commun Stat—Ser B, Simul Comput 31:261–277

Sugar C, James G (2003) Finding the number of clusters in a data set: an information theoretic approach. J Am Stat Assoc 98:750–763

Wei GCG, Tanner MA (1990) A Monte Carlo implementation of the EM algorithm and the poor man’s algorithms. J Am Stat Assoc 85:699–704

Yamamoto W, Shinozaki N (2000b) Two principal points for multivariate location mixtures of distributions. J Japan Stat Soc 30:53–63