A General Evaluation Framework for Topical Crawlers
Tóm tắt
Từ khóa
Tài liệu tham khảo
Aggarwal C, Al-Garawi F and Yu P (2001) Intelligent crawling on the world wide web with arbitrary predicates. In: Proc. 10th International World Wide Web Conference, pp. 96–105.
Amento B, Terveen L and Hill W (2000) Does “Authority” mean quality? Predicting expert quality ratings of web documents. In: Proc. 23rd ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 296–303.
Beaulieu M, Fowkes H and Joho H (2000) Sheffield interactive experiment at TREC-9. In: Proc. 9th Text Retrieval Conference (TREC-9).
Ben-Shaul I, et al. (1999a) Adding support for dynamic and focused search with fetuccino. Computer Networks, 31(11–16):1653–1665.
Ben-Shaul I, Herscovici M, Jacovi M, Maarek Y, Pelleg D, Shtalhaim M, Soroka V and Ur S (1999b) Adding support for dynamic and focused search with fetuccino. Computer Networks, 31(11–16):1653–1665.
Bharat K and Henzinger M (1998) Improved algorithms for topic distillation in hyperlinked environments. In: Proc. 21st ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 104–111.
Brin S and Page L (1998) The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1–7):107–117.
Chakrabarti S, Dom B, Raghavan P, Rajagopalan S, Gibson D and Kleinberg J (1998) Automatic resource compilation by analyzing hyperlink structure and associated text. Computer Networks, 30(1–7):65–74.
Chakrabarti S, Joshi M, Punera K and Pennock D (2002a) The structure of broad topics on the web. In: Lassner D, De Roure D and Iyengar A, eds. Proc. 11th International World Wide Web Conference. ACM Press, New York, NY, pp. 251–262.
Chakrabarti S, Punera K and Subramanyam M (2002b) Accelerated focused crawling through online relevance feedback. In: Lassner D, De Roure D and Iyengar A, eds. Proc. 11th International World Wide Web Conference. ACM Press, New York, NY, pp. 148–159.
Chakrabarti S, van den Berg M and Dom B (1999) Focused crawling: A new approach to topic-specific web resource discovery. Computer Networks, 31(11–16):1623–1640.
Cho J, Garcia-Molina H and Page L (1998) Efficient crawling through URL ordering. Computer Networks, 30(1–7):161–172.
Conover W (1980) Practical Nonparametric Statistics. Wiley, New York, Chapt. 5, pp. 213–343.
Davison B (2000) Topical locality in the Web. In: Proc. 23rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 272–279.
De Bra P and Post R (1994) Information retrieval in the World Wide Web: Making client-based searching feasible. In: Proc. 1st International World Wide Web Conference.
Diligenti M, Coetzee F, Lawrence S, Giles CL and Gori M (2000) Focused crawling using context graphs. In: Proc. 26th International Conference on Very Large Databases (VLDB 2000). Cairo, Egypt, pp. 527–534.
Flake G, Lawrence S and Giles C (2000) Efficient identification of Web communities. In: Proc. 6th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Boston, MA, pp. 150–160.
Henzinger M, Heydon A, Mitzenmacher M and Najork M (1999) Measuring search engine quality using random walks on the Web. In: Proc. 8th International World Wide Web Conference, pp. 213–225.
Hersovici M, Jacovi M, Maarek YS, Pelleg D, Shtalhaim M and Ur S (1998) The shark-search algorithm—An application: Tailored Web site mapping. In: Proc. 7th Intl. World-Wide Web Conference.
Jansen B, Spink A and Saracevic T (2000) Real life, real users and real needs: A study and analysis of users queries on the Web. Information Processing and Management, 36(2):207–227.
Kleinberg J (1999) Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632.
Kumar S, Raghavan P, Rajagopalan S, Sivakumar D, Tomkins A and Upfal E (2000) Stochastic models for the Web graph. In: Proc. 41st Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Silver Spring, MD, pp. 57–65.
Menczer F (1997) ARACHNID: Adaptive retrieval agents choosing heuristic neighborhoods for information discovery. In: Proc. 14th International Conference on Machine Learning, pp. 227–235.
Menczer F (2003) Complementing search engines with online Web mining agents. Decision Support Systems, 35(2):195–212.
Menczer F (2004) Lexical and semantic clustering by Web links. Journal of the American Society for Information Science and Technology, 55(14):1261–1269.
Menczer F and Belew R (1998) Adaptive information agents in distributed textual environments. In: Proc. 2nd International Conference on Autonomous Agents. Minneapolis, MN, pp. 157–164.
Menczer F and Belew R (2000) Adaptive retrieval agents: Internalizing local context and scaling up to the Web. Machine Learning, 39(2–3):203–242.
Menczer F, Pant G, Ruiz M and Srinivasan P (2001) Evaluating topic-driven Web crawlers. In: Kraft DH, Croft WB, Harper DJ and Zobel J, eds. Proc. 24th Annual Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval. ACM Press, New York, NY, pp. 241–249.
Menczer F, Pant G and Srinivasan P (2004) Topical Web crawlers: Evaluating adaptive algorithms. ACM Transactions on Internet Technology, 4(4):378–419.
Mitra M, Singhal A and Buckley C (1998) Improving automatic query expansion. In: Proc. 21st ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 206–214.
Najork M and Wiener JL (2001) Breadth-first search crawling yields high-quality pages. In: Proc. 10th International World Wide Web Conference.
Nelson M (1995) The effect of query characteristics on retrieval results in the TREC retrieval tests. In: Proc. Annual Conference of the Canadian Association for Information Science.
Pant G and Menczer F (2002) MySpiders: Evolve your own intelligent Web crawlers. Autonomous Agents and Multi-Agent Systems, 5(2):221–229.
Pant G, Srinivasan P and Menczer F (2002) Exploration versus exploitation in topic driven crawlers. In: Proc. WWW-02 Workshop on Web Dynamics.
Pinkerton B (1994) Finding what people want: Experiences with the WebCrawler. In: Proc. 1st International World Wide Web Conference.
Rennie J and McCallum A (1999) Using reinforcement learning to spider the Web efficiently. In: Proc. 16th International Conf. on Machine Learning. Morgan Kaufmann, San Francisco, CA, pp. 335–343.
Saracevic T and Kantor P (1998) A study of information seeking and retrieving. II. Users, questions, and effectiveness. Journal of the American Society for Information Science, 39(3):177–196.
Silva I, Ribeiro-Neto B, Calado P, Ziviani N and Moura E (2000) Link-based and content-based evidential information in a belief network model. In: Proceedings of the 23rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103.
Spink A, Wolfram D, Jansen B and Saracevic T (2001) Searching the Web: The public and their queries. Journal of the American Society for Information Science, 52(3):226–234.
Srinivasan P, Mitchell J, Bodenreider O, Pant G and Menczer F (2002) Web Crawling agents for retrieving biomedical information. In: Proc. Int. Workshop on Agents in Bioinformatics (NETTAB-02).
van Rijsbergen C (1979) Information Retrieval, London, 2nd edn. Butterworths.