A comparison of extrinsic clustering evaluation metrics based on formal constraints

Springer Science and Business Media LLC - Tập 12 Số 4 - Trang 461-486 - 2009
Enrique Amigó1, Julio Gonzalo1, Javier Artiles1, Felisa Verdejo1
1Departamento de Lenguajes y Sistemas Informáticos, UNED, Madrid, Spain

Tóm tắt

Từ khóa


Tài liệu tham khảo

Artiles, J., Gonzalo, J., & Sekine, S. (2007). The Semeval-2007 Weps evaluation: Establishing a benchmark for the web people search task. In Proceedings of the 4th International Workshop on Semantic Evaluations (Semeval-2007), June 23–24 (pp. 64–69). Prague.

Bagga, A., & Baldwin, B. (1998). Entity-based cross-document coreferencing using the vector space model. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING-ACL’98) (pp. 79–85). Montreal.

Bakus, J., Hussin, M. F., & Kamel, M. (2002). A SOM-based document clustering using phrases. In Proceedings of the 9th International Conference on Neural Information Procesing (ICONIP’02) (pp. 2212–2216). Singapore.

Dom, B. (2001). An information-theoretic external cluster-validity measure. IBM Research Report.

Ghosh, J. (2003). Scalable clustering methods for data mining. In N. Ye (Ed.), Handbook of data mining. NJ: Lawrence Erlbaum.

Gonzalo, J., & Peters, C. (2005). The impact of evaluation on multilingual text retrieval. In Proceedings of SIGIR 2005 (pp. 603–604). Salvador de Bahia.

Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of Intelligent Information Systems, 17(2–3), 107–145.

Larsen, B., & Aone, C. (1999). Fast and effective text mining using linear-time document clustering. In Knowledge Discovery and Data Mining (pp. 16–22). San Diego, CA.

Meila, M. (2003). Comparing clusterings. In Proceedings of COLT 03. Washington, DC.

Pantel, P., & Lin, D. (2002). Efficiently clustering documents with committees. In Proceedings of the PRICAI 2002 7th Pacific Rim International Conference on Artificial Intelligence (pp. 18–22). Tokyo, Japan.

Rosenberg, A., & Hirschberg, J. (2007). V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (pp. 410–420). Prague.

Steinbach, M., Karypis, G., & Kumar, V. (2000). A comparison of document clustering techniques, KDD 2000 (pp. 109–110). Boston, MA.

Strehl, A. (2002). Relationship-based clustering and cluster ensembles for high-dimensional data mining. PhD thesis, The University of Texas at Austin.

Van Rijsbergen, C. (1974). Foundation of evaluation. Journal of Documentation, 30(4), 365–373.

Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. In SIGIR ’03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 267–273). NY: ACM Press.

Zhao, Y., & Karypis, G. (2001). Criterion functions for document clustering: Experiments and analysis. Technical Report TR 01-40. Department of Computer Science, University of Minnesota, Minneapolis, MN.