Wiley
0002-8231
1097-4571
Cơ quản chủ quản: N/A
Các bài báo tiêu biểu
Một hình thức độc đáo mới về liên kết tài liệu gọi là đồng trích dẫn được định nghĩa là tần suất mà hai tài liệu được trích dẫn cùng nhau. Tần suất đồng trích dẫn của hai bài báo khoa học có thể được xác định bằng cách so sánh danh sách tài liệu trích dẫn trong
This paper examines statistical techniques for exploiting relevance information to weight search terms. These techniques are presented as a natural extension of weighting methods using information about the distribution of index terms in documents in general. A series of relevance weighting functions is derived and is justified by theoretical considerations. In particular, it is shown that specific weighted search methods are implied by a general probabilistic theory of retrieval. Different applications of relevance weighting are illustrated by experimental results for test collections.
A Cumulative Advantage Distribution is proposed which models statistically the situation in which success breeds success. It differs from the Negative Binomial Distribution in that lack of success, being a non‐event, is not punished by increased chance of failure. It is shown that such a stochastic law is governed by the Beta Function, containing only one free parameter, and this is approximated by a skew or hyperbolic distribution of the type that is widespread in bibliometrics and diverse social science phenomena. In particular, this is shown to be an appropriate underlying probabilistic theory for the Bradford Law, the Lotka Law, the Pareto and Zipf Distributions, and for all the empirical results of citation frequency analysis. As side results one may derive also the obsolescence factor for literature use. The Beta Function is peculiarly elegant for these manifold purposes because it yields both the actual and the cumulative distributions in simple form, and contains a limiting case of an inverse square law to which many empirical distributions conform.
It is shown that the mapping of a particular area of science, in this case information science, can be done using authors as units of analysis and the cocitations of pairs of authors as the variable that indicates their “distances” from each other. The analysis assumes that the more two authors are cited together, the closer the relationship between them. The raw data are cocitation counts drawn online from Social Scisearch (
It is argued that a user's subjective evaluation of the personal utility of a retrieval system's output to him, if it could be properly quantified, would be a near‐ideal measure of retrieval effectiveness. A hypothetical methodology is presented for measuring this utility by means of an elicitation procedure. Because the hypothetical methodology is impractical, compromise methods are outlined and their underlying simplifying assumptions are discussed. The more plausible the simplifying assumptions on which a performance measure is based, the better the measure. This, along with evidence gleaned from ‘validation experiments’ of a certain kind, is suggsted as a criterion for selecting or deriving the best measure of effectiveness to use under given test conditions.
It was argued in Part I (see JASIS, March‐April 1973 p. 87) that the best way to evaluate a retrieval system is, in principle at least, to elicit subjective estimates of the system's utility to its users, quantified in terms of the numbers of utiles (e.g. dollars) they would have been willing to give up in exchange for the privilege of using the system; and a naive methodology was outlined for evaluating retrieval systems on this basis. But the impracticality of the naive evaluation procedure as it stands raises the questions: How can one decide which practical measure is likely to yield results most closely resembling those of the naive methodology? And how can one tell whether the resemblance is close enough to make applying the measure worth while? In the present paper two kinds of solution to these problems are taken up. The first answers the questions in terms of the reasonableness of the simplifying assumptions needed to get from the naive measure to the proposed substitute. The second answers it by experimentation.
This article focuses on the human interaction characteristics of an information retrieval system, suggests some design considerations to improve man‐machine cooperation, and describes a research system at Stanford that is exploring some of these techniques.
Librarians can only be of limited assistance in helping the naive user formulate an unstructured feeling in his mind into an appropriate search query that maps into the retrieval system. Consequently, the process of query formulation by the user, interactively with the information available in the system, remains one of the principal problems in information retrieval today.
In an attempt to solve this problem by improving the interface communication between man and the computer, we have pursued the objective of displaying hierarchically structured index trees on a CRT in a decision tree format permitting the user merely to point (with a light pen) at alternatives which seem most appropriate to him. Using his passive rather than his active vocabulary expands his interaction vocabulary by at least an order of magnitude. Moreover, a hierarchically displayed index is a modified thesaurus, and may be augmented by adding lateral links to provide semantic assistance to the user. A hierarchical structure was chosen because it seems to replicate the structure of cognitive thought processes most closely, thus allowing the simplest, most direct transfer of the man's problem into the structure and vocabulary of the system.
This paper is in response to William S. Cooper: “On Selecting a Measure of Retrieval Effectiveness.” Journal of the American Society for Information Science. 1973;24(2)87–100.
As an exploration of the frustration of users of an online interactive retrieval system, students from the School of Library Science of Syracuse University participated in an experiment using an experimental reference retrieval system for library literature on the IBM system 360/50. The searching consisted of sample searches using key‐words. The data base contained library literature citations for the year 1970.
In the control group, students were instructed to locate literature related to library management and information retrieval systems. The particular terms in the search and the format were outlined in an instruction session before the students used the system.
The experimental group was not restricted to a sample search, or specified search terms, but the format of the searches were to be the same as the control group.
It was anticipated that significant variations in the behavior of the users would be displayed and identified by comparing measures of behavior as the man‐computer interaction proceeded through the search process.