Hierarchical Clustering Algorithms for Document Datasets
Tóm tắt
Từ khóa
Tài liệu tham khảo
Aggarwal, C.C., Gates, S.C., and Yu, P.S. 1999. On the merits of building categorization systems by supervised clustering. In Proc. of the Fifth ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining, pp. 352–356.
Beeferman, D. and Berger, A. 2000. Agglomerative clustering of a search engine query log. In Proc. of the Sixth ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining, pp. 407–416.
Boley, D., Gini, M., Gross, R., Han, E.H., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., and Moore, J. 1999. Document categorization and query generation on the world wide web using WebACE. AI Review, 11:365-391.
Boley, D., Gini, M., Gross, R., Han, E.H., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., and Moore, J. 1999. Partitioning-based clustering for web document categorization. Decision Support Systems, 27(3):329–341.
Boley, D. 1998. Principal direction divisive partitioning. Data Mining and Knowledge Discovery, 2(4):325–344.
Cheeseman, P. and Stutz, J. 1996. Baysian classification (autoclass): Theory and results. In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smith, and R. Uthurusamy (Eds.), {Advances in Knowledge Discovery and Data Mining}. pp. 153–180. AAAI/MIT Press.
Chung-Kuan Cheng and Yen-Chuen A. 1991 An improved two-way partitioning algorithm with stable performance. IEEE Transactions on Computer Aided Design, 10(12):1502–1511.
Cutting, D.R., Pedersen, J.O., Karger, D.R., and Tukey, J.W. 1992. Scatter/gather: A cluster-based approach to browsing large document collections. In {Proceedings of the ACM SIGIR}. Copenhagen, pp. 318–329.
Devore, J. and Peck, R. 1997. Statistics: The Exploration and Analysis of Data. Belmont, CA: Duxbury Press.
Dhillon, I., Guan, Y., and Kogan, J. 2002. Iterative clustering of high dimensional text data augmented by local search. In {Proc. of the 2002 IEEE International Conference on Data Mining}, pp. 131–138.
Dhillon, I.S. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In {Knowledge Discovery and Data Mining}, pp. 269–274.
Dhillon I.S. and Modha, D.S. 2001. Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1/2):143–175.
Chris Ding, Xiaofeng He, Hongyuan Zha, Ming Gu, and Horst Simon. 2001. Spectral min-max cut for graph partitioning and data clustering. Technical Report LBNL-47937, Lawrence Berkeley National Laboratory, University of California, Berkeley, CA, 2001.
Duda, R.O., Hart, P.E., and Stork, D.G. 2001. Pattern Classification. John Wiley & Sons.
Guha, S., Rastogi, R., and Shim, K. 1998. {CURE}: An efficient clustering algorithm for large databases. In {Proc. of 1998 ACM-SIGMOD Int. Conf. on Management of Data}, pp. 73–84.
Guha, S., Rastogi, R., and Shim, K. 1999. ROCK: A robust clustering algorithm for categorical attributes. In Proc. of the 15th Int’l Conf. on Data Eng., pp. 512–521.
Hagen, L. and Kahng, A. 1991. Fast spectral methods for ratio cut partitioning and clustering. In {Proceedings of IEEE International Conference on Computer Aided Design}, pp. 10–13.
Han, E.H., Boley, D., Gini, M., Gross, R., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., and Moore, J. 1998. {WebACE}: A web agent for document categorization and exploartion. In {Proc. of the 2nd International Conference on Autonomous Agents}, pp. 408–415.
Han, E.H., Karypis, G., Kumar, V., and Mobasher, B. 1998. Hypergraph based clustering in high-dimensional data sets: A summary of results. Bulletin of the Technical Committee on Data Engineering, 21(1):15–22.
Jain, A.K. and Dubes, R.C. 1988. {Algorithms for Clustering Data}. Prentice Hall, 1988.
Karypis, G., Han, E.H., and Kumar, V. 1999. Chameleon: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 32(8):68–75.
Karypis, G. 2002. {CLUTO} a clustering toolkit. Technical Report 02-017, Dept. of Computer Science, University of Minnesota. Available at http://www.cs.umn.edu~cluto.
King, B. 1967. Step-wise clustering procedures. 1967. Journal of the American Statistical Association. 69: 86–101.
Kohavi, R. and Sommerfield, D. 1995. Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. In Proc. of the First Int’l Conference on Knowledge Discovery and Data Mining. Montreal, Quebec, pp. 192–197.
Larsen, B. and Aone, C. 1999. Fast and effective text mining using linear-time document clustering. In Proc. of the Fifth ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining. pp. 16–22.
Leouski, A. and Croft, W. 1996. An evaluation of techniques for clustering search results. Technical Report IR-76, Department of Computer Science, University of Massachusetts, Amherst.
Lewis, D.D. 1999. Reuters-21578 text categorization test collection distribution 1.0. http://www.research. att.com/∼lewis.
MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. In Proc. 5th Symp. Math. Statist, Prob. pp. 281–297.
Moore, J., Han, E., Boley, D., Gini, M., Gross, R., Hastings, K., Karypis, G., Kumar, V., and Mobasher, B. 1997. Web page categorization and feature selection using association rule and principal component clustering. In {7th Workshop on Information Technologies and Systems}.
Ng, R. and Han, J. 1994. Efficient and effective clustering method for spatial data mining. In {Proc. of the 20th VLDB Conference}. Santiago, Chile, pp. 144–155.
Puzicha, J., Hofmann, T., and Buhmann, J. 2000. A theory of proximity based clustering: Structure detection by optimization. PATREC: Pattern Recognition, Pergamon Press, 33(4):617–634.
Salton, G. 1989. {Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer}. Addison-Wesley.
Savaresi, S. and Boley, D. 2001. On the performance of bisecting k-means and {PDDP}. In {First {SIAM} International Conference on Data Mining ({SDM}’2001)}.
Shi, J. and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 22(8):888–905.
Sneath, P.H. and Sokal, R.R. 1973. Numerical Taxonomy. London, UK: Freeman.
Steinbach, M., Karypis, G., and Kumar, V. 2000. A comparison of document clustering techniques. In KDD Workshop on Text Mining.
Strehl, A. and Ghosh, J. 2000. Scalable approach to balanced, high-dimensional clustering of market-baskets. In {Proceedings of HiPC}, pp. 525–536.
TREC. 1999. Text REtrieval conference. http://trec.nist.gov.
van Rijsbergen, C.J. 1979. Information Retrieval. Butterworths, London.
Willett, P. 1988. Recent trends in hierarchic document clustering: A critical review. Information Processing and Management, 24(5):577–597.
Yahoo! Yahoo! http://www.yahoo.com.
Zahn, K. 1971. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, (C-20):68–86.
Zha, H., He, X., Ding, C., Simon, H., and Gu, M. 2001. Bipartite graph partitioning and data clustering. In CIKM, pp. 25–32.
Zhang, B., Kleyner, G., and Hsu, M. 1999. A local search approach to K-clustering. HP Labs Technical Report HPL-1999-119, Hewlett-Packard Laboratories.
Zhao, Y. and Karypis, G. 2002. Evaluation of hierarchical clustering algorithms for document datasets. In Proc. of Int’l. Conf. on Information and Knowledge Management. pp. 515–524.