Classifying documents with link-based bibliometric measuresSpringer Science and Business Media LLC - Tập 13 - Trang 315-345 - 2009
T. Couto, N. Ziviani, P. Calado, M. Cristo, M. Gonçalves, E. S. de Moura, W. Brandão
Automatic document classification can be used to organize documents in a digital library, construct on-line directories, improve the precision of web searching, or help the interactions between user and search engines. In this paper we explore how linkage information inherent to different document collections can be used to enhance the effectiveness of classification algorithms. We have experiment...... hiện toàn bộ
ReBoost: a retrieval-boosted sequence-to-sequence model for neural response generationSpringer Science and Business Media LLC - Tập 23 - Trang 27-48 - 2019
Yutao Zhu, Zhicheng Dou, Jian-Yun Nie, Ji-Rong Wen
Human–computer conversation is an active research topic in natural language processing. One of the representative methods to build conversation systems uses the sequence-to-sequence (Seq2seq) model through neural networks. However, with limited input information, the Seq2seq model tends to generate meaningless and trivial responses. It can be greatly enhanced if more supplementary information is p...... hiện toàn bộ
An analysis of NP-completeness in novelty and diversity rankingSpringer Science and Business Media LLC - Tập 14 - Trang 89-106 - 2010
Ben Carterette
A useful ability for search engines is to be able to rank objects with novelty and diversity: the top k documents retrieved should cover possible intents of a query with some distribution, or should contain a diverse set of subtopics related to the user’s information need, or contain nuggets of information with little redundancy. Evaluation measures have been introduced to measure the effectivenes...... hiện toàn bộ
Using the Web as corpus for self-training text categorizationSpringer Science and Business Media LLC - Tập 12 - Trang 400-415 - 2008
Rafael Guzmán-Cabrera, Manuel Montes-y-Gómez, Paolo Rosso, Luis Villaseñor-Pineda
Most current methods for automatic text categorization are based on supervised learning techniques and, therefore, they face the problem of requiring a great number of training instances to construct an accurate classifier. In order to tackle this problem, this paper proposes a new semi-supervised method for text categorization, which considers the automatic extraction of unlabeled examples from t...... hiện toàn bộ
Distance matters! Cumulative proximity expansions for ranking documentsSpringer Science and Business Media LLC - Tập 17 - Trang 380-406 - 2014
Jeroen B. P. Vuurens, Arjen P. de Vries
In the information retrieval
process, functions that rank documents according to their estimated relevance to a query typically regard query terms as being independent. However, it is often the joint presence of query terms that is of interest to the user, which is overlooked when matching independent terms. One feature that can be used to express the relatedness of co-occurring terms is their pr...... hiện toàn bộ
Probabilistic models in IR and their relationshipsSpringer Science and Business Media LLC - Tập 17 - Trang 177-201 - 2013
Robin Aly, Thomas Demeester, Stephen Robertson
A solid research path towards new information retrieval models is to further develop the theory behind existing models. A profound understanding of these models is therefore essential. In this paper, we revisit probability ranking principle (PRP)-based models, probability of relevance (PR) models, and language models, finding conceptual differences in their definition and interrelationships. The p...... hiện toàn bộ
Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrievalSpringer Science and Business Media LLC - Tập 16 - Trang 63-90 - 2012
Katja Hofmann, Shimon Whiteson, Maarten de Rijke
As retrieval systems become more complex, learning to rank approaches are being developed to automatically tune their parameters. Using online learning to rank, retrieval systems can learn directly from implicit feedback inferred from user interactions. In such an online setting, algorithms must obtain feedback for effective learning while simultaneously utilizing what has already been learned to ...... hiện toàn bộ
A relatedness analysis of government regulations using domain knowledge and structural organizationSpringer Science and Business Media LLC - Tập 9 - Trang 657-680 - 2006
Gloria T. Lau, Kincho H. Law, Gio Wiederhold
The complexity and diversity of government regulations make understanding and retrieval of regulations a non-trivial task. One of the issues is the existence of multiple sources of regulations and interpretive guides with differences in format, terminology and context. This paper describes a comparative analysis scheme developed to help retrieval of related provisions from different regulatory doc...... hiện toàn bộ