A modification of the k-means method for quasi-unsupervised learning

Knowledge-Based Systems - Tập 37 - Trang 176-185 - 2013
David Rebollo-Monedero1, Marc Solé2, Jordi Nin2, Jordi Forné1
1Department of Telematics Engineering, Technical University of Catalonia (UPC), E-08034 Barcelona, Spain
2Department of Computer Architecture, Technical University of Catalonia (UPC), E-08034 Barcelona, Spain

Tài liệu tham khảo

D. Arthur, B. Manthey, H. Roeglin, k-Means has polynomial smoothed complexity, in: Proc. IEEE Annual Symp. Found. Comput. Sci. (FOCS), Atlanta, GA, October 2009, pp. 1157–1160. Bai, 2011, An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data, Knowl.-Based Syst., 24, 785, 10.1016/j.knosys.2011.02.015 A. Bhowmick, A theoretical analysis of Lloyd’s algorithm for k-means clustering, 2009. M. Bilenko, S. Basu, R.J. Mooney, Integrating constraints and metric learning in semi-supervised clustering, in: Proc. Int. Conf. Mach. Learn. (ICML), Banff, Alberta, Canada, July 2004, pp. 81–88. Bishop, 2006 Chapelle, 2006 V. Cross, Fuzzy semantic distance measures between ontological concepts, in: Proc. N. Amer. Fuzzy Inform. Process. Soc. (NAFIPS), 2004, pp. 236–240. Domingo-Ferrer, 2002, Practical data-oriented microaggregation for statistical disclosure control, IEEE Trans. Knowl. Data Eng., 14, 189, 10.1109/69.979982 Domingo-Ferrer, 2005, Ordinal, continuous and heterogenerous k-anonymity through microaggregation, Data Min. Knowl. Disc., 11, 195, 10.1007/s10618-005-0007-5 Duda, 2001 Fisher, 2000, Accurate retail testing of fashion merchandise: methodology and application, J. Market. Sci., 19 A. Frank, A. Asuncion, UCI machine learning repository, Univ. California, Irvine, Sch. Inform., Comput. Sci., 2010. <http://archive.ics.uci.edu/ml>. Frigui, 1999, A robust competitive clustering algorithm with applications in computer vision, IEEE Trans. Pattern Anal. Mach. Intell., 21, 450, 10.1109/34.765656 Gersho, 1992 Gray, 1998, Quantization, IEEE Trans. Inform. Theory, 44, 2325, 10.1109/18.720541 Gupta, 1999, k-Means clustering algorithm for categorical attributes, vol. 1676, 203 Huang, 2010, A classification algorithm based on local cluster centers with a few labeled training examples, Knowl.-Based Syst., 23, 563, 10.1016/j.knosys.2010.03.015 Huang, 1998, Extensions to the k-means algorithm for clustering large data sets with categorical values, Data Min. Knowl. Disc., 2, 283, 10.1023/A:1009769707641 Hubert, 1985, Comparing partitions, J. Classif., 2, 193, 10.1007/BF01908075 M. Inaba, N. Katoh, H. Imai, Applications of weighted voronoi diagrams and randomization to variance-based k-clustering, in: Proc. ACM Symp. Comput. Geom., 1994, pp. 332–339. Inaba, 2000, Variance-based k-clustering algorithms by Voronoi diagrams and randomization, IEICE Trans. Inform. Syst., E83-D, 1199 Jacquenet, 2009, Discovering unexpected documents in corpora, Knowl.-Based Syst., 22, 421, 10.1016/j.knosys.2009.05.009 Kanungo, 2002, An efficient k-means clustering algorithm: analysis and implementation, IEEE Trans. Pattern Anal. Mach. Intell., 24, 881, 10.1109/TPAMI.2002.1017616 Kaufman, 2005 Li, 2008, Multinomial mixture model with feature selection for text clustering, Knowl.-Based Syst., 21, 704, 10.1016/j.knosys.2008.03.025 Lloyd, 1982, Least squares quantization in PCM, IEEE Trans. Inform. Theory, IT-28, 129, 10.1109/TIT.1982.1056489 J.B. MacQueen, Some methods for classification and analysis of multivariate observations, in: Proc. Berkeley Symp. Math. Stat., Prob. I (Stat.), Berkeley, CA, 1965–1966 (Symp.), 1967 (Proc.), 1967, pp. 281–297. Marquardt, 1963, An algorithm for least-squares estimation of nonlinear parameters, SIAM J. Appl. Math. (SIAP), 11, 431, 10.1137/0111030 Martínez, 2012, Semantically-grounded construction of centroids for datasets with textual attributes, Knowledge-Based Systems, 35, 160, 10.1016/j.knosys.2012.04.030 Max, 1960, Quantizing for minimum distortion, IEEE Trans. Inform. Theory, 6, 7, 10.1109/TIT.1960.1057548 Moré, 1977, The Levenberg–Marquardt algorithm: implementation and theory, vol. 630, 105 A. Ng, CS229 course on machine learning, Stanford Univ., 2011. <http://cs229.stanford.edu>. D. Rebollo-Monedero, Quantization and transforms for distributed source coding, Ph.D. dissertation, Stanford Univ., 2007. Rebollo-Monedero, 2011, An algorithm for k-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers, Data Knowl. Eng., 70, 892, 10.1016/j.datak.2011.06.005 Steinhaus, 1956, Sur la division des corps matériels en parties, Bull. Pol. Acad. Sci., IV, 801 Studholme, 1999, An overlap invariant entropy measure of 3D medical image alignment, Pattern Recognit., 32, 71, 10.1016/S0031-3203(98)00091-0 Sweeney, 2002, k-Anonymity: a model for protecting privacy, Int. J. Uncertain. Fuzz. Knowl.-Based Syst., 10, 557, 10.1142/S0218488502001648 Ueda, 2010, On a global complexity bound of the Levenberg–Marquardt method, J. Optim. Theory Appl., 147, 443, 10.1007/s10957-010-9731-0 I. Wald, V. Hvran, On building fast kd-trees for ray tracing, and on doing that in o(nlogn), in: Proc. IEEE Symp. Interact. Ray Trac., 2006, pp. 61–69. Willenborg, 2001 Xu, 2005, Survey of clustering algorithms, IEEE Trans. Neural Netw., 16, 645, 10.1109/TNN.2005.845141 Xu, 2002, A fast parallel clustering algorithm for large spatial databases, 263 Zhu, 2010, Data clustering with size constraints, Knowl.-Based Syst., 23, 883, 10.1016/j.knosys.2010.06.003