A k-mean clustering algorithm for mixed numeric and categorical data

Data and Knowledge Engineering - Tập 63 Số 2 - Trang 503-527 - 2007
Amir Ahmad1, Lipika Dey2
1Solid State Physics Laboratory, Timarpur, Delhi 110 054, India
2Department of Mathematics, IIT Delhi, Hauz Khas, New Delhi 110 016, India#TAB#

Tóm tắt

Từ khóa


Tài liệu tham khảo

Frawley, 1992, Knowledge discovery in databases: an overview, AI Magazine, 213

Fayyad, 1996

F. Can, E. Ozkarahan, A dynamic cluster maintenance system for information retrieval, in: Proceedings of the Tenth Annual International ACM SIGIR Conference, 1987, pp. 123–131.

M. Eissen, P. Spellman, P. Brown, D. Bostein, Cluster analysis and display of genome- wide expression patterns, in: Proceeding of National Academy of Sciences of USA, vol. 95, 1998, pp. 14863–14868.

Duda, 1973

Jain, 1988

J.B. MacQuuen, Some methods for classification and analysis of multivariate observation, in: Proceedings of the 5th Berkley Symposium on Mathematical Statistics and Probability, 1967, pp. 281–297.

Huang, 1997, Clustering large data sets with mixed numeric and categorical values

Kaufman, 1990

R. Ng, J. Han, Efficient and effective clustering method for spatial data mining, in: Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile, 1994, pp. 144–155.

Huang, 1998, Extensions to the K-modes algorithm for clustering large data sets with categorical values, Data Mining and Knowledge Discovery, 2, 10.1023/A:1009769707641

M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in: Proceedings of KDD’96, 1996.

Sander, 1998, Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications, Data Mining and Knowledge Discovery, 2, 169, 10.1023/A:1009745219419

Dunn, 1974, Some recent investigations of a new fuzzy partitional algorithm and its application to pattern classification problems, Journal of Cybernetics, 4, 1, 10.1080/01969727408546062

Bezdek, 1981

Huang, 1999, A fuzzy k-modes algorithm for clustering categorical data, IEEE Transactions on Fuzzy Systems, 7, 446, 10.1109/91.784206

C. Döring, C. Borgelt, R. Kruse, Fuzzy clustering of quantitative and qualitative data, in: Proceedings of NAFIPS, Banff, Alberta, 2004.

Fisher, 1987, Knowledge acquisition via incremental conceptual clustering, Machine Learning, 2, 139, 10.1007/BF00114265

Lebowitz, 1987, Experiments with incremental concept formation, Machine Learning, 2, 103, 10.1007/BF00114264

M. Gluck, J. Corter, Information, uncertainty, and the utility of categories, in: Proceedings of Seventh Annual Conference in Cognitive Society, 1985, pp. 283–287.

K. McKusick, K. Thomson, COBWEB/3: A portable implementation, Technical Report FIA-90-6-18-2, NASA Ames Research Center, 1990.

Reich, 1991, The formation and use of abstract concepts in design, 323

Biswas, 1998, ITERATE: A conceptual clustering algorithm for data mining, IEEE Transactions on Systems, Man, and Cybernetics, 28C, 219, 10.1109/5326.669556

Cheesman, 1995, Bayesian classification (AUTO-CLASS): Theory and results, Advances in Knowledge Discovery and Data Mining

S. Guha, R. Rastogi, S. Kyuseok, ROCK: A robust clustering algorithm for categorical attributes, in: Proceedings of 15th International Conference on Data Engineering, Sydney, Australia, 23–26 March 1999, pp. 512–521.

V. Ganti, J.E. Gekhre, R. Ramakrishnan, CACTUS-clustering categorical data using summaries, in: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999, pp. 73–83.

Modha, 2003, Feature weighting in k-mean clustering, Machine Learning, 52, 217, 10.1023/A:1024016609528

T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: An efficient data clustering method for very large databases, in: SIGMOD Conference, 1996, pp. 103–114.

Ankerst, 1999, Optics: ordering points to identify the clustering structure, 49

S. Guha, R. Rastogi, K. Shim, CURE: An efficient clustering algorithm for clustering large databases, in: Proceedings of the Symposium on Management of Data (SIGMOD), 1998.

Karypis, 1999, CHAMELEON: A hierarchical clustering algorithm using dynamic modeling, IEEE Computer, 32, 68, 10.1109/2.781637

Goodall, 1966, A new similarity index based on probability, Biometric, 22, 882, 10.2307/2528080

Li, 2002, Unsupervised learning with mixed numeric and nominal data, IEEE Transactions on Knowledge and Data Engineering, 14, 673, 10.1109/TKDE.2002.1019208

Huang, 2005, Automated variable weighting in k-mean type clustering, IEEE Transactions on PAMI, 27, 10.1109/TPAMI.2005.95

H. Luo, F. Kong, Y. Li, Clustering mixed data based on evidence accumulation, in: X. Li, O.R. Zaiane, Z. Li (Eds.), ADMA 2006, Lecture Notes on Artificial Intelligence 4093.

He, 2005, Scalable algorithms for clustering large datasets with mixed type attributes, International Journal of Intelligence Systems, 20, 1077, 10.1002/int.20108

He, 2002, Squeezer: An efficient algorithms for clustering categorical data, Journal of Computer Science and Technology, 17, 611, 10.1007/BF02948829

Stanfill, 1986, Toward memory based reasoning, Communication of the ACM, 29, 1213, 10.1145/7902.7906

Witten, 2000

P. Andritsos, P. Tsaparas, R.J. Miller, K.C. Sevcik, LIMBO: Scalable clustering of categorical data, in: 9th International Conference on Extending DataBase Technology (EDBT), March 2004.

Ahmad, 2007, A method to compute distance between two categorical values of same attributein unsupervised learning for categorical data set, Pattern Recognition Letters, 28, 110, 10.1016/j.patrec.2006.06.006

Ahmad, 2005, A feature selection technique for classificatory analysis, Pattern Recognition Letters, 26, 43, 10.1016/j.patrec.2004.08.015

Basak, 1998, Unsupervised feature selection using a neuro-fuzzy approach, Pattern Recognition Letters, 19, 997, 10.1016/S0167-8655(98)00083-X

Yeung, 2002, Improving performance of similarity-based clustering by feature weight learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 556, 10.1109/34.993562

Sonbaty, 1998, Fuzzy clustering for symbolic data, IEEE Transaction on Fuzzy Systems, 6, 195, 10.1109/91.669013

A. Ahmad, L. Dey, A K-mean clustering algorithm for mixed numeric and categorical data set using dynamic distance measure, in: Proceedings of Fifth International Conference on Advances in Pattern Recognition, ICAPR2003, 2003.

Won, 2005, A k-populations algorithm for clustering categorical data, Pattern Recognition, 38, 1131, 10.1016/j.patcog.2004.11.017

Penã, 1999, An empirical comparison of four initialization methods for the K-mean algorithm, Pattern Recognition Letters, 20, 1027, 10.1016/S0167-8655(99)00069-0

Bradley, 1998, Refining initial points for K-mean clustering, 91

Khan, 2004, Cluster center initialization algorithm for K-mean clustering, Pattern Recognition Letters, 25, 1293, 10.1016/j.patrec.2004.04.007

Yang, 1999, An evaluation of statistical approaches to text categorization, Journal of Information Retrieval, 1, 67