Clustering of conversational bandits with posterior sampling for user preference learning and elicitation

User Modeling and User-Adapted Interaction - Tập 33 - Trang 1065-1112 - 2023
Qizhi Li1, Canzhe Zhao1, Tong Yu2, Junda Wu3, Shuai Li1
1Shanghai Jiao Tong University, Shanghai, China
2Carnegie Mellon University, Pittsburgh, USA
3New York University, New York City, USA

Tóm tắt

Conversational recommender systems elicit user preference via conversational interactions. By introducing conversational key-terms, existing conversational recommenders can effectively reduce the need for extensive exploration required by a traditional interactive recommender. However, there are still limitations of existing conversational recommender approaches eliciting user preference via key-terms. First, the key-term data of the items needs to be carefully labeled, which requires a lot of human efforts. Second, the number of the human labeled key-terms is limited and the granularity of the key-terms is fixed, while the elicited user preference is usually from coarse-grained to fine-grained during the conversations. In this paper, we propose a clustering of conversational bandits algorithm. To avoid the human labeling efforts and automatically learn the key-terms with the proper granularity, we online cluster the items and generate meaningful key-terms for the items during the conversational interactions. Our algorithm is general and can also be used in the user clustering when the feedback from multiple users is available, which further leads to more accurate learning and generations of conversational key-terms. Moreover, to learn the user clustering structure more efficiently in more complex user clustering structure, we further propose a simple yet effective soft user clustering module to perform exploration on user clustering via sampling the posterior user representations. We analyze the regret bound of our learning algorithm. In the empirical evaluations, without using any human labeled key-terms, our algorithm effectively generates meaningful coarse-to-fine grained key-terms and performs as well as or better than the state-of-the-art baseline.

Tài liệu tham khảo