A method for k-means-like clustering of categorical data

Journal of Ambient Intelligence and Humanized Computing - Tập 14 Số 11 - Trang 15011-15021 - 2023

Thu Hien Nguyen¹, Duy-Tai Dinh¹, Songsak Sriboonchitta², Van–Nam Huynh¹

¹Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan

²Faculty of Economics, Centre of Excellence in Econometrics, Chiang Mai University, Chiang Mai, Thailand

Tóm tắt

Từ khóa

Tài liệu tham khảo

Aitchison J, Aitken CGG (1976) Multivariate binary discrimination by the kernel method. Biometrika 63(3):413–420. https://doi.org/10.1093/biomet/63.3.413

Berkhin P (2002) Survey of clustering data mining techniques. Technical report

Blake CL, Merz CJ (1998) UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html

Boriah S, Chandola V, Kumar V (2008) Similarity measures for categorical data: a comparative evaluation. In: Proceedings of the SIAM international conference on data mining, SDM—2008, pp 243–254. https://doi.org/10.1137/1.9781611972788.22

Chen L, Wang S (2013) Central clustering of categorical data with automated feature weighting. In: Proceedings of the twenty-third international joint conference on artificial intelligence, pp 1260–1266. https://www.ijcai.org/Proceedings/13/Papers/190.pdf

Fahad et al (2014) A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279. https://doi.org/10.1109/TETC.2014.2330519

Ganti V, Gehrke J, Ramakrishnan R (1999) CATUS—clustering categorical data using summaries. In: Proceedings of the international conference on knowledge discovery and data mining, (San Diego, USA), pp 73–83. https://doi.org/10.1145/312129.312201

Gibson D, Kleinberg J, Raghavan P (2000) Clustering categorical data: an approach based on dynamic systems. VLDB J 8:222–236. https://doi.org/10.1007/s007780050005

Guha S, Rastogi R, Shim K (2000) ROCK: a robust clustering algorithm for categorical attributes. Inf Syst 25(5):345–366. https://doi.org/10.1016/S0306-4379(00)00022-3

Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. In: Proceedings of ACM SIGMOD international conference on management of data, New York, pp 73–84. https://doi.org/10.1145/276304.276312

Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann Publishers, San Francisco

Huang Z (1997) Clustering large data sets with mixed numeric and categorical values. In: Lu H, Motoda H, Liu H (eds) KDD: techniques and applications. World Scientific, Singapore, pp 21–34

Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2:283–304. https://doi.org/10.1023/A:1009769707641

Huang Z, Ng MK, Rong H, Li Z (2005) Automated variable weighting in $$k$$-means type clustering. IEEE Trans Pattern Anal Mach Intell 27(5):657–668. https://doi.org/10.1109/TPAMI.2005.95

Hubert L, Arabie P (1995) Comparing partitions. J Classif 2(1):193–218. https://doi.org/10.1007/BF01908075

Ienco D, Pensa RG, Meo R (2012) From context to distance: learning dissimilarity for categorical data clustering. ACM Trans Knowl Discov Data 6(1):1–25. https://doi.org/10.1145/2133360.2133361

Ienco D, Pensa RG, Meo R (2009) Context-based distance learning for categorical data clustering. In: Advances in intelligent data analysis viii: 8th international symposium. Springer, pp 83–94. https://doi.org/10.1007/978-3-642-03915-7_8

Kogan J, Teboulle M, Nicholas C (2005) Data driven similarity measures for $$k$$-means like clustering algorithms. Inf Retr 8(2):331–349. https://doi.org/10.1007/s10791-005-5666-8

Kushwaha N, Pant M (2018) Fuzzy magnetic optimization clustering algorithm with its application to health care. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-018-0941-x

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10.1038/nature14539

Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the 15th international conference on machine learning, pp 296–304. http://dl.acm.org/citation.cfm?id=645527.657297

MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth symposium on mathematical statistics and probability, Berkeley, CA, 1967, vol 1, no. AD 669871, pp 281–297

Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

Ng MK, Li MJ, Huang JZ, He Z (2007) On the impact of dissimilarity measure in $$k$$-modes clustering algorithm. IEEE Trans Pattern Anal Mach Intell 29:503–507. https://doi.org/10.1109/TPAMI.2007.53

Nguyen TTH, Huynh VN (2016) A $$k$$-means like algorithm for clustering categorical data using an information theoretic-based dissimilarity measure. In: Foundations of information and knowledge systems—9th international symposium, FoIKS-2016. Springer, pp 115–130. https://doi.org/10.1007/978-3-319-30024-5_7

San OM, Huynh VN, Nakamori Y (2004) An alternative extension of the $$k$$-means algorithm for clustering categorical data. Int J Appl Math Comput Sci 14(2):241–247. http://matwbn.icm.edu.pl/ksiazki/amc/amc14/amc14212.pdf

Selim SZ, Ismail MA (1984) k-Means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Anal Mach Intell 6:81–87. https://doi.org/10.1109/TPAMI.1984.4767478

Shirkhorshidi A, Aghabozorgi S, Wah T, Herawan T, (2014) Big data clustering: a review. In: Computational science and its applications—ICCSA (2014) 14th international conference, Guimaraes, Portugal, Proceedings, part V, pp 707–720: https://doi.org/10.1007/978-3-319-09156-3_49

Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617. https://doi.org/10.1162/153244303321897735

Sumangali K, Aswani Kumar Ch (2019) Concept lattice simplification in formal concept analysis using attribute clustering. J Ambient Intell Humaniz Comput 10:2327–2343. https://doi.org/10.1007/s12652-018-0831-2

Tellaroli P, Bazzi M, Donato M, Brazzale AR, Draghici S (2016) Cross-clustering: a partial clustering algorithm with automatic estimation of the number of clusters. PLoS One 11(3):e0152333. https://doi.org/10.1371/journal.pone.0152333

Titterington DM (1980) A comparative study of kernel-based density estimates for categorical data. Technometrics 22(2):259–268. https://doi.org/10.1080/00401706.1980.10486142

Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2:165–193. https://doi.org/10.1007/s40745-015-0040-1

Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678. https://doi.org/10.1109/TNN.2005.845141

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA