An information-theoretic perspective of tf–idf measures

Information Processing & Management - Tập 39 Số 1 - Trang 45-65 - 2003
Akiko Aizawa1
1National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan

Tóm tắt

Từ khóa


Tài liệu tham khảo

Aizawa, 2000, The feature quantity: an information-theoretic perspective of tfidf-like measures, 104

Aizawa, 2001, Linguistic techniques to improve the performance of automatic text categorization, 307

Amati, 1998, Semantic information retrieval, 189

Baayen, 2001

Baeza-Yates, 1988

Brookes, 1972, The Shannon model of IR systems, Journal of Documentation, 28, 160, 10.1108/eb026537

Church, 1999, Inverse document frequency (IDF): a measure of deviations from Poisson, 283

Church, 1990, Word association norms, mutual information and lexicography, Computational Linguistics, 6, 22

Cover, 1991

Crestani, 2000, Exploiting the similarity of non-matching terms at retrieval time, Journal of Information Retrieval, 2, 23, 10.1023/A:1009973415168

Croft, 1979, Using probabilistic models of document retrieval without relevance information, Journal of Documentation, 35, 285, 10.1108/eb026683

Dennis, 1964, The construction of a thesaurus automatically from a sample of text

Dunning, 1993, Accurate methods for the statistics of surprise and coincidence, Computational Linguistics, 19, 61

Fuhr, 1989, Models for retrieval with probabilistic indexing, Information Processing and Management, 25, 55, 10.1016/0306-4573(89)90091-5

Fung, 1996, A technical word and term translation aid using noisy parallel corpora across language groups, The Machine Translation Journal, 12, 53

Grefenstette, 1994

Greiff, 1998, A theory of term weighting based on exploratory data analysis, 11

Hiemstra, 2000, A probabilistic justification for using tf×idf term weighting in information retrieval, International Journal on Digital Libraries, 3, 131, 10.1007/s007999900025

Joachims, 1997, A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization, 143

Kageura, 1996, Methods of automatic term recognition: a review, Terminology, 3, 259, 10.1075/term.3.2.03kag

Kageura, 1999, Evaluation of the term recognition task, 42

Kita, 1999

Koller, 1996, Toward optimal feature selection, 284

Koller, 1997, Hierarchically classifying documents using very few words, 170

Lewis, 1994, Comparison of two learning algorithms for text categorization, 81

Luhn, 1957, A statistical approach to mechanized encoding and searching of literary information, IBM Journal of Research and Development, 1, 309, 10.1147/rd.14.0309

Manning, 1999

Matsumoto, 1999

McCallum, 1998

Mladenić, 1998, Feature subset selection in text-learning, 95

Nagao, 1976, An automated method for the extraction of important words from Japanese scientific documents, Transactions of Information Processing Society of Japan, 17, 110

National Center for Science Information Systems (1999). NTCIR workshop 1––Proceedings of the first NTCIR workshop on research in Japanese text retrieval and term recognition. http://research.nii.ac.jp/∼ntcadm/workshop/OnlineProceedings/

Robertson, 1990, On term selection for query expansion, Journal of Documentation, 46, 359, 10.1108/eb026866

Robertson, 1994, Query-document symmetry and dual models, Journal of Documentation, 50, 233, 10.1108/eb026932

Robertson, 1976, Relevance weighting of search terms, Journal of the American Society of Information Science, 27, 129, 10.1002/asi.4630270302

Salton, 1988, Weighting approaches in automatic text retrieval, Information Processing and Management, 24, 513, 10.1016/0306-4573(88)90021-0

Salton, 1983

Slonim, 2000, Document clustering using word clusters via the information bottleneck method, 208

Smadja, 1993, Retrieving collocations from text: Xtract, Computational Linguistics, 19, 143

Sparck-Jones, 1972, A statistical interpretation of term specificity and its application in retrieval, Journal of Documentation, 28, 11, 10.1108/eb026526

Van Rijsbergen, 1981, The selection of good search terms, Information Processing and Management, 17, 77, 10.1016/0306-4573(81)90029-7

Wiener, 1995, A neural network approach to topic spotting, 317

Wong, 1992, An information-theoretic measure of term specificity, Journal of the American Society for Information Science, 43, 54, 10.1002/(SICI)1097-4571(199201)43:1<54::AID-ASI5>3.0.CO;2-A

Yang, 1999, A re-examination of text categorization methods, 42

Yang, 1997, A comparative study on feature selection in text categorization, 412