Locality in search engine queries and its implications for caching

Proceedings - IEEE INFOCOM - Tập 3 - Trang 1238-1247 vol.3
Yinglian Xie1,2, D. O'Hallaron3
1Department of Computer Science, Carnegie Mellon University, USA
2Carnegie Mellon University, Pittsburgh, PA, US
3Dept. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA

Tóm tắt

Caching is a popular technique for reducing both server load and user response time in distributed systems. We consider the question of whether caching might be effective for search engines as well. We study two real search engine traces by examining query locality and its implications for caching. Our trace analysis produced three results. One result shows that queries have significant locality, with query frequency following a Zipf distribution. Very popular queries are shared among different users and can be cached at servers or proxies, while 16% to 22% of the queries are from the same users and should be cached at the user side. Multiple-word queries are shared less and should be cached mainly at the user side. Another result shows that if caching is to be done at the user side, short-term caching for hours is enough to cover query temporal locality, while server/proxy caching should use longer periods, such as days. The third result showed that most users have small lexicons when submitting queries. Frequent users who submit many search requests tend to reuse a small subset of words to form queries. Thus, with proxy or user side caching, prefetching based on the user lexicon looks promising.

Từ khóa

#Search engines #Delay #Network servers #Frequency #Prefetching #Computer science #Computer networks #Bandwidth #Distributed computing

Tài liệu tham khảo

0 0 10.1016/S0140-3664(00)00308-X 10.1016/S0140-3664(00)00308-X 10.1145/339331.339357 10.1145/258612.258668 feldmann, 0, Performance of web proxy caching in heterogenous environments, Proc IEEE INFOCOM New York NY Mar 1999 taylor, 0, Using distributed query result caching to evaluate queries for parallel data mining algorithms, Proc of the Int Conf on Parallel and Distributed Techniques and Applications 1998 10.1145/281250.281253 kroeger, 0, Exploring the bounds of web latency reduction from caching and prefetching, USENIX Proceedings of the Symposium on Internet Technologies and Systems 99 December 1997 cao, 0, Web prefetching between low-bandwidth clients and proxies: Potential and performance, Proceedings of the ACM SIGMETRICS Conference Atlanta GE May 1999 rousskov, 0, On performance of caching proxies, Proceedings of the ACM SIGMETRICS Conference Madison WI June 1998 adali, 0, Query caching and optimization in distributed mediator systems, Proc of the 1996 ACM SIGMOD Conf on Management of Data 1996, 137 silverstein, 1998, Analysis of a very large alta vista query log