Locality in search engine queries and its implications for caching

Proceedings - IEEE INFOCOM - Tập 3 - Trang 1238-1247 vol.3

Yinglian Xie^1,2, D. O'Hallaron³

¹Department of Computer Science, Carnegie Mellon University, USA

²Carnegie Mellon University, Pittsburgh, PA, US

³Dept. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA

Tóm tắt

Caching is a popular technique for reducing both server load and user response time in distributed systems. We consider the question of whether caching might be effective for search engines as well. We study two real search engine traces by examining query locality and its implications for caching. Our trace analysis produced three results. One result shows that queries have significant locality, with query frequency following a Zipf distribution. Very popular queries are shared among different users and can be cached at servers or proxies, while 16% to 22% of the queries are from the same users and should be cached at the user side. Multiple-word queries are shared less and should be cached mainly at the user side. Another result shows that if caching is to be done at the user side, short-term caching for hours is enough to cover query temporal locality, while server/proxy caching should use longer periods, such as days. The third result showed that most users have small lexicons when submitting queries. Frequent users who submit many search requests tend to reuse a small subset of words to form queries. Thus, with proxy or user side caching, prefetching based on the user lexicon looks promising.

Từ khóa

#Search engines #Delay #Network servers #Frequency #Prefetching #Computer science #Computer networks #Bandwidth #Distributed computing

Tài liệu tham khảo

0 0 10.1016/S0140-3664(00)00308-X 10.1016/S0140-3664(00)00308-X 10.1145/339331.339357 10.1145/258612.258668 feldmann, 0, Performance of web proxy caching in heterogenous environments, Proc IEEE INFOCOM New York NY Mar 1999 taylor, 0, Using distributed query result caching to evaluate queries for parallel data mining algorithms, Proc of the Int Conf on Parallel and Distributed Techniques and Applications 1998 10.1145/281250.281253 kroeger, 0, Exploring the bounds of web latency reduction from caching and prefetching, USENIX Proceedings of the Symposium on Internet Technologies and Systems 99 December 1997 cao, 0, Web prefetching between low-bandwidth clients and proxies: Potential and performance, Proceedings of the ACM SIGMETRICS Conference Atlanta GE May 1999 rousskov, 0, On performance of caching proxies, Proceedings of the ACM SIGMETRICS Conference Madison WI June 1998 adali, 0, Query caching and optimization in distributed mediator systems, Proc of the 1996 ACM SIGMOD Conf on Management of Data 1996, 137 silverstein, 1998, Analysis of a very large alta vista query log

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA