On the relationship between query characteristics and IR functions retrieval bias

Wiley - Tập 62 Số 8 - Trang 1515-1532 - 2011
Shariq Bashir1, Andreas Rauber1
1Institute of Software Technology and Interactive Systems, Vienna University of Technology, Austria

Tóm tắt

Abstract

Bias quantification of retrieval functions with the help of document retrievability scores has recently evolved as an important evaluation measure for recall‐oriented retrieval applications. While numerous studies have evaluated retrieval bias of retrieval functions, solid validation of its impact on realistic types of queries is still limited. This is due to the lack of well‐accepted criteria for query generation for estimating retrievability. Commonly, random queries are used for approximating documents retrievability due to the prohibitively large query space and time involved in processing all queries. Additionally, a cumulative retrievability score of documents over all queries is used for analyzing retrieval functions (retrieval) bias. However, this approach does not consider the difference between different query characteristics (QCs) and their influence on retrieval functions' bias quantification. This article provides an in‐depth study of retrievability over different QCs. It analyzes the correlation of lower/higher retrieval bias with different query characteristics. The presence of strong correlation between retrieval bias and query characteristics in experiments indicates the possibility of determining retrieval bias of retrieval functions without processing an exhaustive query set. Experiments are validated on TREC Chemical Retrieval Track consisting of 1.2 million patent documents.

Từ khóa


Tài liệu tham khảo

Arampatzis A. Kamps J. Kooken M. &Nussbaum N.(2007).Access to legal documents: Exact match best match and combinations. Proceedings of the Sixteenth Text Retrieval Conference (TREC 2007) Gaithersburg Maryland.

10.1145/1835449.1835667

10.1145/1277741.1277820

10.1145/1571941.1572122

10.1145/1458082.1458157

10.1002/asi.20941

10.1007/978-3-642-03573-9_63

10.1145/1645953.1646250

10.1007/978-3-642-12275-0_40

10.1145/564376.564429

10.2307/1937992

10.1016/j.is.2005.11.003

10.1145/1141753.1141818

10.1016/j.ipm.2006.11.006

10.1145/1150402.1150478

10.1038/21987

10.1145/1670564.1670576

10.1145/1835449.1835551

10.1145/567498.567527

Owens C.(2009).A study of the relative bias of web search engines toward news media providers (master's thesis.) University of Glasgow.

10.1007/978-1-4471-2099-5_24

10.1145/1458082.1458159

Tague J., 1981, Proceedings of the Third Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 236

10.1016/S0306-4573(03)00063-3

Zhai C.(2002).Risk minimization and language modeling in text retrieval. (PhD thesis.) Carnegie Mellon University.

10.1145/984321.984322

10.1007/978-3-540-78646-7_8