Hierarchical Clustering Algorithms for Document Datasets

Data Mining and Knowledge Discovery - 2005

Ying Zhao¹, George Karypis¹, Usama M. Fayyad¹

¹Department of Computer Science and Engineering and Digital Technology Center and Army HPC Research Center, University of Minnesota, Minneapolis

Tóm tắt

Từ khóa

Tài liệu tham khảo

Aggarwal, C.C., Gates, S.C., and Yu, P.S. 1999. On the merits of building categorization systems by supervised clustering. In Proc. of the Fifth ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining, pp. 352–356.

Beeferman, D. and Berger, A. 2000. Agglomerative clustering of a search engine query log. In Proc. of the Sixth ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining, pp. 407–416.

Boley, D., Gini, M., Gross, R., Han, E.H., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., and Moore, J. 1999. Document categorization and query generation on the world wide web using WebACE. AI Review, 11:365-391.

Boley, D., Gini, M., Gross, R., Han, E.H., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., and Moore, J. 1999. Partitioning-based clustering for web document categorization. Decision Support Systems, 27(3):329–341.

Boley, D. 1998. Principal direction divisive partitioning. Data Mining and Knowledge Discovery, 2(4):325–344.

Cheeseman, P. and Stutz, J. 1996. Baysian classification (autoclass): Theory and results. In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smith, and R. Uthurusamy (Eds.), {Advances in Knowledge Discovery and Data Mining}. pp. 153–180. AAAI/MIT Press.

Chung-Kuan Cheng and Yen-Chuen A. 1991 An improved two-way partitioning algorithm with stable performance. IEEE Transactions on Computer Aided Design, 10(12):1502–1511.

Cutting, D.R., Pedersen, J.O., Karger, D.R., and Tukey, J.W. 1992. Scatter/gather: A cluster-based approach to browsing large document collections. In {Proceedings of the ACM SIGIR}. Copenhagen, pp. 318–329.

Devore, J. and Peck, R. 1997. Statistics: The Exploration and Analysis of Data. Belmont, CA: Duxbury Press.

Dhillon, I., Guan, Y., and Kogan, J. 2002. Iterative clustering of high dimensional text data augmented by local search. In {Proc. of the 2002 IEEE International Conference on Data Mining}, pp. 131–138.

Dhillon, I.S. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In {Knowledge Discovery and Data Mining}, pp. 269–274.

Dhillon I.S. and Modha, D.S. 2001. Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1/2):143–175.

Chris Ding, Xiaofeng He, Hongyuan Zha, Ming Gu, and Horst Simon. 2001. Spectral min-max cut for graph partitioning and data clustering. Technical Report LBNL-47937, Lawrence Berkeley National Laboratory, University of California, Berkeley, CA, 2001.

Duda, R.O., Hart, P.E., and Stork, D.G. 2001. Pattern Classification. John Wiley & Sons.

Guha, S., Rastogi, R., and Shim, K. 1998. {CURE}: An efficient clustering algorithm for large databases. In {Proc. of 1998 ACM-SIGMOD Int. Conf. on Management of Data}, pp. 73–84.

Guha, S., Rastogi, R., and Shim, K. 1999. ROCK: A robust clustering algorithm for categorical attributes. In Proc. of the 15th Int’l Conf. on Data Eng., pp. 512–521.

Hagen, L. and Kahng, A. 1991. Fast spectral methods for ratio cut partitioning and clustering. In {Proceedings of IEEE International Conference on Computer Aided Design}, pp. 10–13.

Han, E.H., Boley, D., Gini, M., Gross, R., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., and Moore, J. 1998. {WebACE}: A web agent for document categorization and exploartion. In {Proc. of the 2nd International Conference on Autonomous Agents}, pp. 408–415.

Han, E.H., Karypis, G., Kumar, V., and Mobasher, B. 1998. Hypergraph based clustering in high-dimensional data sets: A summary of results. Bulletin of the Technical Committee on Data Engineering, 21(1):15–22.

Jain, A.K. and Dubes, R.C. 1988. {Algorithms for Clustering Data}. Prentice Hall, 1988.

Karypis, G., Han, E.H., and Kumar, V. 1999. Chameleon: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 32(8):68–75.

Karypis, G. 2002. {CLUTO} a clustering toolkit. Technical Report 02-017, Dept. of Computer Science, University of Minnesota. Available at http://www.cs.umn.edu~cluto.

King, B. 1967. Step-wise clustering procedures. 1967. Journal of the American Statistical Association. 69: 86–101.

Kohavi, R. and Sommerfield, D. 1995. Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. In Proc. of the First Int’l Conference on Knowledge Discovery and Data Mining. Montreal, Quebec, pp. 192–197.

Larsen, B. and Aone, C. 1999. Fast and effective text mining using linear-time document clustering. In Proc. of the Fifth ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining. pp. 16–22.

Leouski, A. and Croft, W. 1996. An evaluation of techniques for clustering search results. Technical Report IR-76, Department of Computer Science, University of Massachusetts, Amherst.

Lewis, D.D. 1999. Reuters-21578 text categorization test collection distribution 1.0. http://www.research. att.com/∼lewis.

MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. In Proc. 5th Symp. Math. Statist, Prob. pp. 281–297.

Moore, J., Han, E., Boley, D., Gini, M., Gross, R., Hastings, K., Karypis, G., Kumar, V., and Mobasher, B. 1997. Web page categorization and feature selection using association rule and principal component clustering. In {7th Workshop on Information Technologies and Systems}.

Ng, R. and Han, J. 1994. Efficient and effective clustering method for spatial data mining. In {Proc. of the 20th VLDB Conference}. Santiago, Chile, pp. 144–155.

Porter, M.F. 1980 An algorithm for suffix stripping. Program, 14(3):130–137.

Puzicha, J., Hofmann, T., and Buhmann, J. 2000. A theory of proximity based clustering: Structure detection by optimization. PATREC: Pattern Recognition, Pergamon Press, 33(4):617–634.

Salton, G. 1989. {Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer}. Addison-Wesley.

Savaresi, S. and Boley, D. 2001. On the performance of bisecting k-means and {PDDP}. In {First {SIAM} International Conference on Data Mining ({SDM}’2001)}.

Shi, J. and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 22(8):888–905.

Sneath, P.H. and Sokal, R.R. 1973. Numerical Taxonomy. London, UK: Freeman.

Steinbach, M., Karypis, G., and Kumar, V. 2000. A comparison of document clustering techniques. In KDD Workshop on Text Mining.

Strehl, A. and Ghosh, J. 2000. Scalable approach to balanced, high-dimensional clustering of market-baskets. In {Proceedings of HiPC}, pp. 525–536.

TREC. 1999. Text REtrieval conference. http://trec.nist.gov.

van Rijsbergen, C.J. 1979. Information Retrieval. Butterworths, London.

Willett, P. 1988. Recent trends in hierarchic document clustering: A critical review. Information Processing and Management, 24(5):577–597.

Yahoo! Yahoo! http://www.yahoo.com.

Zahn, K. 1971. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, (C-20):68–86.

Zha, H., He, X., Ding, C., Simon, H., and Gu, M. 2001. Bipartite graph partitioning and data clustering. In CIKM, pp. 25–32.

Zhang, B., Kleyner, G., and Hsu, M. 1999. A local search approach to K-clustering. HP Labs Technical Report HPL-1999-119, Hewlett-Packard Laboratories.

Zhao, Y. and Karypis, G. 2002. Evaluation of hierarchical clustering algorithms for document datasets. In Proc. of Int’l. Conf. on Information and Knowledge Management. pp. 515–524.

Zhao, Y. and Karypis, G. 2004. Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 55(3):311–331.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích ảnh hưởng của các bài báo, công bố khoa học Việt Nam và Quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ SciBase

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Hệ thống hội thảo khoa học Việt Nam

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA

Thông tin liên hệ & hỗ trợ

Đơn vị chủ quản, phát triển và vận hành: Công ty Cổ phần Metis

Địa chỉ liên hệ: 26A Lê Đức Thọ, Phường Từ Liêm, Thành phố Hà Nội

Số giấy chứng nhận ĐKKD: 0109293202 cấp ngày 03/08/2020 tại Sở Kế hoạch và Đầu tư thành phố Hà Nội

Người quản lý và chịu trách nhiệm nội dung: Nguyễn Ngọc Sơn

Hotline: 0566.685.688

Email: [email protected]