An information-theoretic perspective of tf–idf measures
Tóm tắt
Từ khóa
Tài liệu tham khảo
Aizawa, 2000, The feature quantity: an information-theoretic perspective of tfidf-like measures, 104
Aizawa, 2001, Linguistic techniques to improve the performance of automatic text categorization, 307
Amati, 1998, Semantic information retrieval, 189
Baayen, 2001
Baeza-Yates, 1988
Church, 1999, Inverse document frequency (IDF): a measure of deviations from Poisson, 283
Church, 1990, Word association norms, mutual information and lexicography, Computational Linguistics, 6, 22
Cover, 1991
Crestani, 2000, Exploiting the similarity of non-matching terms at retrieval time, Journal of Information Retrieval, 2, 23, 10.1023/A:1009973415168
Croft, 1979, Using probabilistic models of document retrieval without relevance information, Journal of Documentation, 35, 285, 10.1108/eb026683
Dennis, 1964, The construction of a thesaurus automatically from a sample of text
Dunning, 1993, Accurate methods for the statistics of surprise and coincidence, Computational Linguistics, 19, 61
Fuhr, 1989, Models for retrieval with probabilistic indexing, Information Processing and Management, 25, 55, 10.1016/0306-4573(89)90091-5
Fung, 1996, A technical word and term translation aid using noisy parallel corpora across language groups, The Machine Translation Journal, 12, 53
Grefenstette, 1994
Greiff, 1998, A theory of term weighting based on exploratory data analysis, 11
Hiemstra, 2000, A probabilistic justification for using tf×idf term weighting in information retrieval, International Journal on Digital Libraries, 3, 131, 10.1007/s007999900025
Joachims, 1997, A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization, 143
Kageura, 1996, Methods of automatic term recognition: a review, Terminology, 3, 259, 10.1075/term.3.2.03kag
Kageura, 1999, Evaluation of the term recognition task, 42
Kita, 1999
Koller, 1996, Toward optimal feature selection, 284
Koller, 1997, Hierarchically classifying documents using very few words, 170
Lewis, 1994, Comparison of two learning algorithms for text categorization, 81
Luhn, 1957, A statistical approach to mechanized encoding and searching of literary information, IBM Journal of Research and Development, 1, 309, 10.1147/rd.14.0309
Manning, 1999
Matsumoto, 1999
McCallum, 1998
Mladenić, 1998, Feature subset selection in text-learning, 95
Nagao, 1976, An automated method for the extraction of important words from Japanese scientific documents, Transactions of Information Processing Society of Japan, 17, 110
National Center for Science Information Systems (1999). NTCIR workshop 1––Proceedings of the first NTCIR workshop on research in Japanese text retrieval and term recognition. http://research.nii.ac.jp/∼ntcadm/workshop/OnlineProceedings/
Robertson, 1990, On term selection for query expansion, Journal of Documentation, 46, 359, 10.1108/eb026866
Robertson, 1994, Query-document symmetry and dual models, Journal of Documentation, 50, 233, 10.1108/eb026932
Robertson, 1976, Relevance weighting of search terms, Journal of the American Society of Information Science, 27, 129, 10.1002/asi.4630270302
Salton, 1988, Weighting approaches in automatic text retrieval, Information Processing and Management, 24, 513, 10.1016/0306-4573(88)90021-0
Salton, 1983
Slonim, 2000, Document clustering using word clusters via the information bottleneck method, 208
Smadja, 1993, Retrieving collocations from text: Xtract, Computational Linguistics, 19, 143
Sparck-Jones, 1972, A statistical interpretation of term specificity and its application in retrieval, Journal of Documentation, 28, 11, 10.1108/eb026526
Van Rijsbergen, 1981, The selection of good search terms, Information Processing and Management, 17, 77, 10.1016/0306-4573(81)90029-7
Wiener, 1995, A neural network approach to topic spotting, 317
Wong, 1992, An information-theoretic measure of term specificity, Journal of the American Society for Information Science, 43, 54, 10.1002/(SICI)1097-4571(199201)43:1<54::AID-ASI5>3.0.CO;2-A
Yang, 1999, A re-examination of text categorization methods, 42
Yang, 1997, A comparative study on feature selection in text categorization, 412