Improving document clustering using Okapi BM25 feature weighting
Tóm tắt
Từ khóa
Tài liệu tham khảo
Aljaber, B., Stokes, N., Bailey, J., & Pei, J. (2010). Document clustering of scientific texts using citation contexts. Information Retrieval, 13, 101–131.
Bashier, S., & Rauber, A. (2009). Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection. In CIKM (pp. 1863–1866).
Beil, F., Ester, M., & Xu, X. (2002). Frequent term-based text clustering. In KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 436–442).
Boley, D., Gini, M., Gross, R., Han, E. H., Hastings, K., Karypis, G., et al. (1999). Document categorization and query generation on the World Wide Web using WebACE. AI Review, 11, 365–391.
D’hondt, J., Vertommena, J., Verhaegena, P., Cattryssea, D., & Dufloua, J. R. (2010). Pairwise-adaptive dissimilarity measure for document clustering. Information Sciences, 180, 2341–2358.
Fung, B. C. M., Wangy, K., & Ester, M. (2003). Hierarchical document clustering using frequent itemsets. In SDM ’03: Proceedings of the SIAM international conference on data mining (pp. 59–70).
Hofmann, T. (1999). Probabilistic latent semantic analysis. In UAI ’99: Uncertainty in Artificial Intelligence (pp. 289–296).
Hu, X., Zhang, X., Lu, C., Park, E. K., & Zhou, X. (2009). Exploiting wikipedia as external knowledge for document clustering. In KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 389–396).
Jain, A. K., Murthy, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Reviews, 31, 264–323.
Kaufman, L., & Rousseeuw, P. (1990). Finding groups in data: An introduction to cluster analysis. Wiley: New York.
Kutty, S., Nayak, R., & Li, Y. (2010). Utilising semantic tags in XML clustering. In Focused retrieval and evaluation (pp. 416–425).
Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28, 129–137.
Ng, A. Y., Jordan, M. I., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In Advances in neural information processing systems 14 (pp. 849–856). Cambridge: MIT Press.
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical magazine, 2, 559–572.
Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M., & Gatford, M. (1994). Okapi at TREC-3. In TREC ’94: The third text retrieval conference.
Sevillano, X., Cobo, G., Alías, F., & Socoró, J. C. (2006). Feature diversity in cluster ensembles for robust document clustering. In SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 697–698).
Shi, J., & Malik J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Slonim, N., & Tishby, N. (2000). Document clustering using word clusters via the information bottleneck method. In SIGIR ’00: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval (pp. 208–215).
Steinbach, M., Karypis, G., & Kumar, V. (2000). A comparison of document clustering techniques. In KDD 00’ text mining workshop.
Strehl, A., & Ghosh, J. (2002). Cluster ensembles – a knowledge reuse framework for combining multipe partitions. Journal of Machine Learning Research, 3, 583–617.
van Rijsbergen, C. J. (1979). Information retrieval. Butterworth, 2nd ed.
Whissell, J. S., Clarke, C. L. A., & Ashkan, A. (2009). Clustering web queries. In CIKM 09: Proceedings of the 18th ACM conference on information and knowledge management (pp. 899–908).
Wilbur, W. J., & Kim, W. (2009). The ineffectiveness of within-document term frequency in text classification. Information Retrieval, 12(5), 509–525.
Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. In SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval (pp. 267–273).
Zhao, Y., & Karypis, G. (2001). Criterion functions for document clustering: Experiments and analysis. Technical Report 01-40, University of Minnesota, Department of Computer Science/Army HPC Research Center.
Zhao, Y., & Karypis, G. (2002). Evaluation of hierarchical clustering algorithms for document datasets. In Data mining and knowledge discovery (pp. 515–524).