Probabilistic models in IR and their relationships

Springer Science and Business Media LLC - Tập 17 - Trang 177-201 - 2013

Robin Aly¹, Thomas Demeester², Stephen Robertson³

¹University of Twente, Enschede, The Netherlands

²Ghent University, iMinds, Ghent, Belgium

³University College London, London, UK

Tóm tắt

A solid research path towards new information retrieval models is to further develop the theory behind existing models. A profound understanding of these models is therefore essential. In this paper, we revisit probability ranking principle (PRP)-based models, probability of relevance (PR) models, and language models, finding conceptual differences in their definition and interrelationships. The probabilistic model of the PRP has not been explicitly defined previously, but doing so leads to the formulation of two actual principles with different objectives. First, the belief probability ranking principle (BPRP), which considers uncertain relevance between known documents and the current query, and second, the popularity probability ranking principle (PPRP), which considers the probability of relevance of documents among multiple queries with the same features. Our analysis shows how some of the discussed PR models implement the BPRP or the PPRP while others do not. However, for some models the parameter estimation is challenging. Finally, language models are often presented as related to PR models. However, we find that language models differ from PR models in every aspect of a probabilistic model and the effectiveness of language models cannot be explained by the PRP.

Tài liệu tham khảo

Aly, R., & Demeester, T. (2011). Towards a better understanding of the relationship between probabilistic models. In G. Amati & F. Crestani, (Eds.), ICTIR ’11: Proceedings of the 3nd international conference on theory of information retrieval: Advances in information retrieval theory (Vol. 6931, pp. 164–175). doi:10.1007/978-3-642-23318-0_16. Bishop, C. M. (2006). Pattern recognition and machine learning (Information Science and Statistics). New York: Springer. Chen, H., & Karger, D. R. (2006). Less is more: Probabilistic models for retrieving fewer relevant documents. In SIGIR’06: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 429–436). ACM, doi:10.1145/1148170.1148245. Cooper, W. S. (1994). The formalism of probability theory in ir: A foundation for an encumbrance? In SIGIR’94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 242–247). ISBN 3-540-19889-X. Cox, R. (1946). Probability, frequency and reasonable expectation. American Journal of Physics, 14(1), 1–13. doi:10.1119/1.1990764. Crestani, F., Lalmas, M., Rijsbergen, C. J. V., & Campbell, I. (1998). Is this document relevant?\(\ldots\)probably: A survey of probabilistic models in information retrieval. ACM Computing Surveys 30(4), 528–552. Fang, H., & Zhai, C. (2005). An exploration of axiomatic approaches to information retrieval. In SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 480–487). ACM, doi:10.1145/1076034.1076116. Feller, W. (1968) An introduction to probability theory and its applications (Vol. 1, 3rd Edn). Wiley, ISBN 0471257087. Fuhr, N. (1992). Probabilistic models in information retrieval. The Computer Journal, 35(3), 243–255. Hiemstra, D. (2001). Using language models for information retrieval. PhD thesis, University of Twente. Kullback, S., & Leibler, R. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22, 79–86, ISSN 0003-4851. Lafferty, J., & Zhai, C. (2003). Probabilistic relevance models based on document and query generation (Vol. 13, , pp. 1–10, chapter 1). Dordrecht: Kluwer Academic Publishers. Lavrenko, V., & Croft, W. B. (2003). Language modeling for information retrieval, chapter Relevance models in information retrieval (pp. 11–56). Dordrecht: Kluwer Academic Publishers. Lewis, D. D. (1998). Naive (bayes) at forty: The independence assumption in information retrieval. In ECML-98: Machine learning, Vol. 1398/1998 of Lecture Notes in Computer Science (pp. 4–15). Berlin: Springer. doi:10.1007/BFb0026666. Liu, T.-Y. (2009). Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3, 225–331. doi:10.1561/1500000016. Luk, R. W. P. (2008). On event space and rank equivalence between probabilistic retrieval models. Information Retrieval, 11(6), 539–561. Lv, Y. (2012). Improving the effectiveness of language modeling approaches to information retrieval: Bridging the theory-effectiveness gap. PhD thesis, University of Illinois at Urbana-Champaign. URL http://hdl.handle.net/2142/34306. Manning, C. D. & Schuetze, H. (1999). Foundations of statistical natural language processing. The MIT Press, 1 edn, ISBN 0-26213-360-1. Maron, M. E., & Kuhns, J. L. (1960). On relevance, probabilistic indexing and information retrieval. Journal of the ACM, 7(3), 216–244. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 275–281). ACM, doi:10.1145/290941.291008. Robertson, S. E. (1977). The probability ranking principle in IR. Journal of Documentation, 33, 294–304. Robertson, S. E. (2005). On event spaces and probabilistic models in information retrieval. Information Retrieval, 8(2), 319–329. ISSN 1386-4564 (Print) 1573–7659 (Online). Robertson, S. E., & Spärck-Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3), 129–146. doi:10.1002/asi.4630270302. Robertson S. E., Maron M. E., & Cooper W. S. (1982) Probability of relevance: A unification of two competing models for document retrieval. Information Technology: Research and Development 1(1):1–21. Roelleke, T. & Wang, J. (2006). A parallel derivation of probabilistic information retrieval models. In SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 107–114). ACM, doi:10.1145/1148170.1148192. Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communication of the ACM, 18(11), 613–620. doi:10.1145/361219.361220. Spärck-Jones, K., Robertson, S. E., Zaragoza, H., & Hiemstra, D. (2003). Language modelling for information retrieval, chapter Language modelling and relevance, pp 57–71. Kluwer. Voorhees, E., Harman, D., N.I. of Standards, T. (US) (2005). TREC: Experiment and evaluation in information retrieval. Cambridge: MIT Press Wang, J., & Zhu, J. (2009). Portfolio theory of information retrieval. In SIGIR ’09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (pp. 115–122). ACM. doi:10.1145/1571941.1571963. Zhai, C. (2008). Statistical language models for information retrieval a critical review. Foundations and Trends in Information Retrieval, 2(3), 137–213. Zhai, C., & Lafferty, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems 22(2), 179–214. Zhai, C., & Lafferty, J. (2006). A risk minimization framework for information retrieval. Information Processing and Management 42(1), 31–55.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA