A k-mean clustering algorithm for mixed numeric and categorical data

Data and Knowledge Engineering - Tập 63 Số 2 - Trang 503-527 - 2007

Amir Ahmad¹, Lipika Dey²

¹Solid State Physics Laboratory, Timarpur, Delhi 110 054, India

²Department of Mathematics, IIT Delhi, Hauz Khas, New Delhi 110 016, India#TAB#

Tóm tắt

Từ khóa

Tài liệu tham khảo

Frawley, 1992, Knowledge discovery in databases: an overview, AI Magazine, 213

Fayyad, 1996

F. Can, E. Ozkarahan, A dynamic cluster maintenance system for information retrieval, in: Proceedings of the Tenth Annual International ACM SIGIR Conference, 1987, pp. 123–131.

M. Eissen, P. Spellman, P. Brown, D. Bostein, Cluster analysis and display of genome- wide expression patterns, in: Proceeding of National Academy of Sciences of USA, vol. 95, 1998, pp. 14863–14868.

Duda, 1973

Jain, 1988

J.B. MacQuuen, Some methods for classification and analysis of multivariate observation, in: Proceedings of the 5th Berkley Symposium on Mathematical Statistics and Probability, 1967, pp. 281–297.

Huang, 1997, Clustering large data sets with mixed numeric and categorical values

Kaufman, 1990

R. Ng, J. Han, Efficient and effective clustering method for spatial data mining, in: Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile, 1994, pp. 144–155.

Huang, 1998, Extensions to the K-modes algorithm for clustering large data sets with categorical values, Data Mining and Knowledge Discovery, 2, 10.1023/A:1009769707641

M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in: Proceedings of KDD’96, 1996.

Sander, 1998, Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications, Data Mining and Knowledge Discovery, 2, 169, 10.1023/A:1009745219419

Dunn, 1974, Some recent investigations of a new fuzzy partitional algorithm and its application to pattern classification problems, Journal of Cybernetics, 4, 1, 10.1080/01969727408546062

Bezdek, 1981

Huang, 1999, A fuzzy k-modes algorithm for clustering categorical data, IEEE Transactions on Fuzzy Systems, 7, 446, 10.1109/91.784206

C. Döring, C. Borgelt, R. Kruse, Fuzzy clustering of quantitative and qualitative data, in: Proceedings of NAFIPS, Banff, Alberta, 2004.

Fisher, 1987, Knowledge acquisition via incremental conceptual clustering, Machine Learning, 2, 139, 10.1007/BF00114265

Lebowitz, 1987, Experiments with incremental concept formation, Machine Learning, 2, 103, 10.1007/BF00114264

M. Gluck, J. Corter, Information, uncertainty, and the utility of categories, in: Proceedings of Seventh Annual Conference in Cognitive Society, 1985, pp. 283–287.

K. McKusick, K. Thomson, COBWEB/3: A portable implementation, Technical Report FIA-90-6-18-2, NASA Ames Research Center, 1990.

Reich, 1991, The formation and use of abstract concepts in design, 323

Biswas, 1998, ITERATE: A conceptual clustering algorithm for data mining, IEEE Transactions on Systems, Man, and Cybernetics, 28C, 219, 10.1109/5326.669556

Cheesman, 1995, Bayesian classification (AUTO-CLASS): Theory and results, Advances in Knowledge Discovery and Data Mining

S. Guha, R. Rastogi, S. Kyuseok, ROCK: A robust clustering algorithm for categorical attributes, in: Proceedings of 15th International Conference on Data Engineering, Sydney, Australia, 23–26 March 1999, pp. 512–521.

V. Ganti, J.E. Gekhre, R. Ramakrishnan, CACTUS-clustering categorical data using summaries, in: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999, pp. 73–83.

Modha, 2003, Feature weighting in k-mean clustering, Machine Learning, 52, 217, 10.1023/A:1024016609528

T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: An efficient data clustering method for very large databases, in: SIGMOD Conference, 1996, pp. 103–114.

Ankerst, 1999, Optics: ordering points to identify the clustering structure, 49

S. Guha, R. Rastogi, K. Shim, CURE: An efficient clustering algorithm for clustering large databases, in: Proceedings of the Symposium on Management of Data (SIGMOD), 1998.

Karypis, 1999, CHAMELEON: A hierarchical clustering algorithm using dynamic modeling, IEEE Computer, 32, 68, 10.1109/2.781637

Goodall, 1966, A new similarity index based on probability, Biometric, 22, 882, 10.2307/2528080

Li, 2002, Unsupervised learning with mixed numeric and nominal data, IEEE Transactions on Knowledge and Data Engineering, 14, 673, 10.1109/TKDE.2002.1019208

Huang, 2005, Automated variable weighting in k-mean type clustering, IEEE Transactions on PAMI, 27, 10.1109/TPAMI.2005.95

H. Luo, F. Kong, Y. Li, Clustering mixed data based on evidence accumulation, in: X. Li, O.R. Zaiane, Z. Li (Eds.), ADMA 2006, Lecture Notes on Artificial Intelligence 4093.

He, 2005, Scalable algorithms for clustering large datasets with mixed type attributes, International Journal of Intelligence Systems, 20, 1077, 10.1002/int.20108

He, 2002, Squeezer: An efficient algorithms for clustering categorical data, Journal of Computer Science and Technology, 17, 611, 10.1007/BF02948829

Stanfill, 1986, Toward memory based reasoning, Communication of the ACM, 29, 1213, 10.1145/7902.7906

Witten, 2000

P. Andritsos, P. Tsaparas, R.J. Miller, K.C. Sevcik, LIMBO: Scalable clustering of categorical data, in: 9th International Conference on Extending DataBase Technology (EDBT), March 2004.

Ahmad, 2007, A method to compute distance between two categorical values of same attributein unsupervised learning for categorical data set, Pattern Recognition Letters, 28, 110, 10.1016/j.patrec.2006.06.006

Ahmad, 2005, A feature selection technique for classificatory analysis, Pattern Recognition Letters, 26, 43, 10.1016/j.patrec.2004.08.015

Basak, 1998, Unsupervised feature selection using a neuro-fuzzy approach, Pattern Recognition Letters, 19, 997, 10.1016/S0167-8655(98)00083-X

Yeung, 2002, Improving performance of similarity-based clustering by feature weight learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 556, 10.1109/34.993562

Sonbaty, 1998, Fuzzy clustering for symbolic data, IEEE Transaction on Fuzzy Systems, 6, 195, 10.1109/91.669013

A. Ahmad, L. Dey, A K-mean clustering algorithm for mixed numeric and categorical data set using dynamic distance measure, in: Proceedings of Fifth International Conference on Advances in Pattern Recognition, ICAPR2003, 2003.

Won, 2005, A k-populations algorithm for clustering categorical data, Pattern Recognition, 38, 1131, 10.1016/j.patcog.2004.11.017

Penã, 1999, An empirical comparison of four initialization methods for the K-mean algorithm, Pattern Recognition Letters, 20, 1027, 10.1016/S0167-8655(99)00069-0

Bradley, 1998, Refining initial points for K-mean clustering, 91

Khan, 2004, Cluster center initialization algorithm for K-mean clustering, Pattern Recognition Letters, 25, 1293, 10.1016/j.patrec.2004.04.007

Yang, 1999, An evaluation of statistical approaches to text categorization, Journal of Information Retrieval, 1, 67

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA