Clustering with Instance and Attribute Level Side Information

International Journal of Computational Intelligence Systems - Tập 3 - Trang 770-785 - 2010

Jinlong Wang^1,2, Shunyao Wu¹, Gang Li³

¹School of Computer Engineering, Qingdao Technological University, Qingdao, China

²Medical College of Qingdao University, Qingdao, China

³School of Information Technology, Deakin University, Victoria, Australia

Tóm tắt

Selecting a suitable proximity measure is one of the fundamental tasks in clustering. How to effectively utilize all available side information, including the instance level information in the form of pair-wise constraints, and the attribute level information in the form of attribute order preferences, is an essential problem in metric learning. In this paper, we propose a learning framework in which both the pair-wise constraints and the attribute order preferences can be incorporated simultaneously. The theory behind it and the related parameter adjusting technique have been described in details. Experimental results on benchmark data sets demonstrate the effectiveness of proposed method.

Tài liệu tham khảo

A. K. Jain, M. N. Murty and P. J. Flynn, “Data clustering: a review”, ACM Computing Surveys, 31(3):264–323(1999). R. K. Brouwer, “Clustering feature vectors with mixed numerical and categorical attributes”, International Journal of Computational Intelligence Systems, 1(4):285–298(2008). R. K. Brouwer, “Fuzzy relational fixed point clustering”, International Journal of Computational Intelligence Systems, 2(1):69–82(2009). S. Ilhan, N. Duru and E. Adali, “Improved fuzzy art method for initializing k-means”, International Journal of Computational Intelligence Systems, 3(3):274–279(2010). S. Basu, M. Bilenko and R. J. Mooney, “A probabilistic framework for semi-supervised clustering”, Proc. of the 10th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 59–68(2004). N. Grira, M. Crucianu and N. Boujema, “Unsupervised and semi-supervised clustering: a brief survey”, In a Review of Machine Learning Techniques for Processing Multimedia Content, Report of the MUSCLE European Network of Excellence (FP6)(2005). I. Davidson, K. Wagstaff and S. Basu, “Measuring constraint-set utility for partitional clustering algorithms”, Proc. of the 10th Euro. Conf. on Principle and Practice of Knowledge Discovery in Databases, 115–126(2006). L. Yang and R. Jin, “Distance metric learning: A comprehensive survey”, Michigan State Universiy, (2006). R. Kulis, S. Basu, I. Dhillon and R. Mooney, “Semisupervised graph clustering: a kernel approach”, Mach. Learn., 74:1–22(2009). X. S. Yin, S. C. Chen, E. L. Hu and D. Q. Zhang, “Semi-supervised clustering with metric learning: an adaptive kernel method”, Pattern Recognition, 43(4):1320–1333(2010). K. Wagstaff and C. Cardie, “Clustering with instance-level constraints”, Proc. of the 17th Intl. Conf. on Machine Learning, 1103–1110(2000). K. Wagstaff, C. Cardie, S. Rogers and S. Schrödl, “Constrained k-means clustering with background knowledge”, Proc. of the 18th Intl. Conf. on Machine Learning, 577–584(2001). D. Klein, S. D. Kamvar and C. D. Manning, “From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering”, Proc. of the 19th Intl. Conf. on Machine Learning, 307–314(2002). N. Shental, A. Bar-hillel and D. Weinshall, “Computing gaussian mixture models with em using equivalence constraints”, Advances in Neural Information Processing Systems 16, (2003). M. Bilenko, S. Basu and R. J. Mooney, “Integrating constraints and metric learning in semi-supervised clustering”, Proc. of the 21st Intl. Conf. on Machine Learning, 81–88(2004). N. Kumar and K. Kummamuru, “Semi-supervised clustering with metric learning using relative comparisons”, IEEE Transactions on Knowledge and Data Engineering, 20(4):496–503(2008). E. P. Xing, A. Y. Ng, M. I. Jordan and S. J. Russell, “Distance metric learning with application to clustering with side-information”, Advances in Neural Information Processing Systems 15, 505–512(2002). A. Bar-Hillel, T. Hertz, N. Shental and D. Weinshall, “Learning a mahalanobis metric from equivalence constraints”, J. Mach. Learn. Res., 6:937–965(2005). M. Halkidi, D. Gunopulos, M. Vazirgiannis, N. Kumar and C. Domeniconi, “A clustering framework based on subjective and objective validity criteria”, ACM Trans. Knowl. Discov. Data., 1(4):1–25(2008). S. Xiang, F. Nie and C. Zhang, “Learning a Mahalanobis distance metric for data clustering and classification”, Pattern Recognition, 41(12):3600–3612(2008). S. Basu, A. Banerjee and R. J. Mooney, “Active semi-supervision for pairwise constrained clustering”, Proc. of the 4th SIAM Intl. Conf. on Data Mining, 333–344(2004). A. Huang, D. Milne, E. Frank and I. H. Witten, “Clustering documents with active learning using Wikipedia”, Proc. of the 8th IEEE Intl. Conf. on Data Mining, 839–844(2008). R. Huang and W. Lam, “An active learning framework for semi-supervised document clustering with language modeling”, Data & Knowledge Engineering, 68(1):49–67(2009). J. Wang, S. Wu, Vu. H and G. Li, “Text document clustering with metric learning”, Proc. of the 33rd Annual Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 783–784(2010). S. Banerjee, K. Ramanathan and A. Gupta, “Clustering short texts using wikipedia”, Proc. of the 30th Annual Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 787–788(2007). I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin and C. G. Nevill-Manning, “KEA: Practical automatic keyphrase extraction”, Proc. of the 4th ACM Conf. on Digital Libraries, 255(1999). P. D. Turney, “Learning to extract keyphrases from text”, National Research Council, Institute for Information Technology, Technical Report ERB-1057, (1999). X. Wu and A. Bolivar, “Keyword extraction for contextual advertisement”, Proc. of the 17th Intl Conf. on World Wide Web, 1195–1196(2008). T. Joachims, “Optimizing search engines using click-through data”, Proc. of the 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 133–142(2002). C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton and G. Hullender, “Learning to rank using gradient descent”, Proc. of the 22nd Intl. Conf. on Machine Learning, 89–96(2005). S. Yu, K. Yu, V. Tresp and H. P. Kriegel, “Collaborative ordinal regression”, Proc. of the 23rd Intl. Conf. on Machine learning, 1089–1096(2006). X. Zhu and A. Goldberg, “Kernel regression with order preferences”, Proc. of the 22nd AAAI Conf. on Artificial Intelligence, 681–687(2007). J. Sun, W. Zhao, J. Xue, Z. Shen and Y. Shen, “Clustering with feature order preferences”, Proc. of the 10th Pacific Rim Intl. Conf. on Artificial Intelligence, 382–393(2008). X. Ji and W. Xu, “Document clustering with prior knowledge”, Proc. of the 29th Annual Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 405–412(2006). Y. Chen, M. Rege, M. Dong and J. Hua, “Incorporating user provided constraints into document clustering”, Proc. of the 7th IEEE Intl. Conf. on Data Mining, 103–112(2007). G. Hu, S. Zhou, J. Guan and X. Hu, “Towards effective document clustering: A constrained k-means based approach”, Inf. Process. Manage., 44(4):1397–1409(2008). S. Boyd and L. Vandenberghe, “Convex optimization”, Cambridge University Press, (2004). E. D. Andersen and Y. Ye, “On a homogeneous algorithm for the monotone complementarity problem”, Mathematical Programming, 84(2):375–399(1999). A. K. Jain and R. C. Dubes, “Algorithms for clustering data”, Prentice-Hall, Inc., (1988). M. Halkidi, Y. Batistakis and M. Vazirgiannis, “On Clustering Validation Techniques”, Journal of Intelligent Information Systems, 17(2–3):107–145(2001). D. Pfitzner, R. Leibbrandt and D. Powers, “Characterization and evaluation of similarity measures for pairs of clusterings”, Knowl. Inf. Syst., 19:361–394(2009). X. Z. Fern and C. E. Brodley, “Random projection for high dimensional data clustering: A cluster ensemble approach”, Prof. of the 20th Intl. Conf. on Machine Learning, 186–193(2003). A. Fred and A. Jain, “Robust data clustering”, Proc. of the 2003 IEEE Intl. Conf. on Computer Vision and Pattern Recognition, 2, 128–136(2003). X. Yin, E. Hu and S. Chen, “Discriminative semi-supervised clustering analysis with pairwise constraints”, Journal of Software(in Chinese), 19(11):2791–2802(2008). X. Hu, X. Zhang, C. Lu, E. K. Park and X. Zhou, “Exploiting Wikipedia as external knowledge for document clustering”, Proceedings of the 15th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 389–396(2009).

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA