An analysis of NP-completeness in novelty and diversity ranking

Springer Science and Business Media LLC - Tập 14 - Trang 89-106 - 2010

Ben Carterette¹

¹Department of Computer and Info. Sciences, University of Delaware, Newark, USA

Tóm tắt

A useful ability for search engines is to be able to rank objects with novelty and diversity: the top k documents retrieved should cover possible intents of a query with some distribution, or should contain a diverse set of subtopics related to the user’s information need, or contain nuggets of information with little redundancy. Evaluation measures have been introduced to measure the effectiveness of systems at this task, but these measures have worst-case NP-hard computation time. The primary consequence of this is that there is no ranking principle akin to the Probability Ranking Principle for document relevance that provides uniform instruction on how to rank documents for novelty and diversity. We use simulation to investigate the practical implications of this for optimization and evaluation of retrieval systems.

Tài liệu tham khảo

Agrawal, R., Gollapudi, S., Halverson, H., & Ieong, S. (2009). Diversifying search results. In Proceedings of the 2nd ACM international conference on web search and data mining (pp. 5–14). Allan, J., Carterette, B., & Lewis, J. (2005). When will information retrieval be ’good enough?’. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 433–440). Bogdanov, A., & Trevisan, L. (2006). Average-case complexity. Hanover, MA: Now Publishers Inc. Carbonell, J. G., & Goldstein, J. (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 335–336). Carterette, B., & Chandar, P. (2009). Probabilistic models of novel document rankings for faceted topic retrieval. In Proceedings of the 18th ACM international conference on information and knowledge management. Chen, H., & Karger, D. R. (2006). Less is more: Probabilistic models for retrieving fewer relevant documents. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 429–436). Clarke, C. L., Craswell, N., & Soboroff, I. (2009a). Overview of the TREC 2009 web track. In Proceedings of the 18th text retrieval conference (TREC). Clarke, C. L. A., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., Büttcher, S., et al. (2008). Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 659–666). Clarke, C. L., Kolla, M., & Vechtomova, O. (2009b). An effectiveness measure for ambiguous and underspecified queries. In Advances in information retrieval theory: Proceedings of the 2nd international conference on the theory of information retrieval (pp. 188–199). Feige, U. (1998). A threshold of ln n for approximating set cover. Journal of the ACM, 45(4),634–652. Goffman, W. (1964). On relevance as a measure. Information Storage and Retrieval, 2(3), 201–203. Jarvelin, K., & Kekalainen, J. (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information and Systems, 20(4), 422–446. Lenstra, J. K., Kan, A. H. G. R., & Brucker, P. (1977). Complexity of machine scheduling problems. In P. L. Hammer (Ed.), Studies in integer programming (Vol. 1). North Holland: Addison-Wesley. Li, P., Burges, C. J., & Wu, Q. (2008). McRank: Learning to rank using multiple classification and gradient boosting. In Advances in neural information processing systems (Vol. 20, pp. 897–904). Moffat, A., & Zobel, J. (2008). Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information and Systems, 27, 1–27. Radlinski, F., Kleinberg, R., & Joachims, T. (2008). Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th international conference on machine learning (pp. 784–791). Robertson, S. E. (1977). The probability ranking principle in information retrieval. Journal of Documentation, 33, 294–304. Vee, E., Srivastava, U., Shanmugasundaram, J., Bhat, P., & Amer-Yahia, S. (2008). Efficient computation of diverse query results. In Proceedings of the 24th international conference on data engineering (pp. 228–236). Zaman, A., & Simberloff, D. (2002). Random binary matrices in biogeographical ecology—instituting a good neighbor policy. Environmental and Ecological Statistics, 9, 405–421. Zhai, C., Cohen, W. W., & Lafferty, J. D. (2003). Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 10–17). Zipf, G. K. (1949). Human behavior and the principle of least-effort. Cambridge, MA: Addison-Wesley.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA