Query-based sampling of text databases

ACM Transactions on Information Systems - Tập 19 Số 2 - Trang 97-130 - 2001
Jamie Callan1, Margaret E. Connell2
1Carnegie Mellon Univ.
2Univ., of Massachusetts

Tóm tắt

The proliferation of searchable text databases on corporate networks and the Internet causes a database selection problem for many people. Algorithms such as gGLOSS and CORI can automatically select which text databases to search for a given information need, but only if given a set of resource descriptions that accurately represent the contents of each database. The existing techniques for a acquiring resource descriptions have significant limitations when used in wide-area networks controlled by many parties. This paper presents query-based sampling , a new technicque for acquiring accurate resource descriptions. Query-based sampling does not require the cooperation of resource providers, nor does it require that resource providers use a particular search engine or representation technique. An extensive set of experimental results demonstrates that accurate resource descriptions are crated, that computation and communication costs are reasonable, and that the resource descriptions do in fact enable accurate automatic dtabase selection.

Từ khóa


Tài liệu tham khảo

ALLAN J., 1995, Proceedings of the 4th Text Retrieval Conference (TREC-4, Washington, D.C., Nov.), D. K. Harman, Ed. National Institute of Standards and Technology, 49

ALLAN J., 1999, Proceedings of the 7th Conference on Text Retrieval (TREC-7

BAUMGARTEN C., 1997, Proceedings of the 20th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '97, 258

CALLAN J., Advances in Information Retrieval

CALLAN J., 1999, Proceedings of the 1999 ACM International Conference on Management of Data (SIGMOD '99, 479, 10.1145/304182.304224

10.1016/0306-4573(94)00050-D

10.1145/215206.215328

CLARKE I., 2000, Proceedings of the ICSI Workshop on Design Issues in Anonymity and Unobservability

CRASWELL N., 2000, Proceedings of the 5th ACM Conference on Digital Libraries. ACM, 37, 10.1145/336597.336628

FRENCH J., 1999, Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '99

FRENCH J.C., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Re-trieval (SIGIR '98, 121

10.1145/314516.314517

GRAVANO L., 1995, Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95

GRAVANO L., 1997, Proceedings of the International ACM Conference on Management of Data (SIGMOD '97

GRAVANO L., 1994, Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (SIGMOD '94

GRAVANO L., 1994, Proceedings of the 3rd IEEE International Conference on Parallel and Distributed Information Systems (PDIS

HARMAN D.K., 1994, Proceedings of the 2nd Conference on Text Retrieval. (TREC-2). National Institute of Standards and Technology

HARMAN D., 1995, Proceedings of the 3rd Conference on Text Retrieval. (TREC-3, 10.6028/NIST.SP.500-225

10.1145/297117.297123

HEAPS J. 1978. Information Retrieval-Computational and Theoretical Aspects. Academic Press Inc. New York NY. HEAPS J. 1978. Information Retrieval-Computational and Theoretical Aspects. Academic Press Inc. New York NY.

KROVETZ R. J. 1995. Word sense disambiguation for large text databases. Ph.D. Dissertation. Computer and Information Science Department University of Massachusetts Amherst MA. KROVETZ R. J. 1995. Word sense disambiguation for large text databases. Ph.D. Dissertation. Computer and Information Science Department University of Massachusetts Amherst MA.

LARKEY L., 2000, Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM '00), 282

LUHN H. P., 1958, The automatic creation of literature abstracts, IBM J. Res. Dev., 2, 159, 10.1147/rd.22.0159

MARCUS R. S., 1983, An experimental comparison of the effectiveness of computers and humans as search intermediaries, J. Am. Soc. Inf. Sci., 34, 381, 10.1002/asi.4630340605

MENG W., 1998, Proceedings of the 24th International Conference on Very Large Data Bases, A. Gupta, O. Shmueli, and J. Widom, Eds. Morgan Kaufmann, 14

MENG W., 1999, Proceedings of the 15th International IEEE Conference on Data Engineering, 146

MORONEY M. J. 1951. Facts from Figures. Penguin Books New York NY. MORONEY M. J. 1951. Facts from Figures. Penguin Books New York NY.

POWELL A., 2000, Proceedings of the 23rd Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '00), 232

PRESS W.H. TEUKOLSKY S.A. VETTERLING W.T. AND FLANNERY B. P. 1992. Numerical Recipes in C: The Art of Scientific Computing. 2nd ed. Cambridge University Press New York NY. PRESS W.H. TEUKOLSKY S.A. VETTERLING W.T. AND FLANNERY B. P. 1992. Numerical Recipes in C: The Art of Scientific Computing. 2nd ed. Cambridge University Press New York NY.

TURTLE H. R. 1991. Inference networks for document retrieval. Ph.D. Dissertation. Computer and Information Science Department University of Massachusetts Amherst MA. TURTLE H. R. 1991. Inference networks for document retrieval. Ph.D. Dissertation. Computer and Information Science Department University of Massachusetts Amherst MA.

10.1145/125187.125188

VILES C.L., 1995, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '95, 12

VOORHEES E.M., 1997, Proceedings of the 2nd ACM International Conference on Digital Libraries (DL '97, 93, 10.1145/263690.263800

VOORHEES E.M., 1995, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '95, 172

WEISS R., 1996, Proceedings of the Seventh ACM Conference on Hypertext '96 (Washington, D.C., Mar. 16-20), 180, 10.1145/234828.234846

XU J., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98, 112

XU J., 1999, Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '99

YUWONO B., 1996, Proceedings of the 12th IEEE International Conference on Data Engineering (ICDE '97, 164

YUWONO B., 1997, Proceedings of the 5th International Conference on Database Systems for Advanced Applications, 41

ZIPF G. K. 1949. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley Reading MA. ZIPF G. K. 1949. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley Reading MA.