Leveraging mixed distribution of multi-head attention for sequential recommendation

Springer Science and Business Media LLC - Tập 53 - Trang 454-469 - 2022
Yihao Zhang1, Xiaoyang Liu2
1School of Artificial Intelligence, Chongqing University of Technology, Chongqing, China
2School of Computer Science and Engineering, Chongqing University of Technology, Chongqing, China

Tóm tắt

Attention mechanism has been proven to be a useful model for sequence recommendation. Traditional multi-head self-attention architecture can exploit the entire user sequence and adaptively consider consumed items for the next item recommendation. However, the scaling between the number of heads and the size of each head in the multi-head attention model gives rise to a low-rank bottleneck problem, resulting in insufficient expression power and hurting the performance of recommendation model. In this paper, we propose a variant of self-attention called mixed distribution of multi-head attention for sequence recommendation (MMSRec), which constructs the mixed distribution model by weighted averaging of multiple simple distributions, instead of currently dominant methods by increasing the embedding size for addressing the low-rank bottleneck. Extensive experiments on four real-world datasets show that our MMSRec algorithm has significant improvements over state-of-the-art algorithms. Empirical evidence shows that the performance of our recommendation model can be effectively improved by stacking multiple low-rank distributions.

Tài liệu tham khảo

Guan X, Cheng Z, He X, et al. (2019) Attentive aspect modeling for review-aware recommendation[J]. ACM Trans Inf Syst 37(3):1–27 Pujahari A, Sisodia DS (2021) Preference relation based collaborative filtering with graph aggregation for group recommender system[J]. Appl Intell 51(2):658–672 Wang D, Xu D, Yu D, et al. (2021) Time-aware sequence model for next-item recommendation[J]. Appl Intell 51(2):906–920 Li G, Qiu L, Yu C, et al. (2020) IPTV Channel zapping recommendation with attention mechanism[J]. IEEE Trans Multimed 23:538–549 Xu C, Feng J, Zhao P, et al. (2021) Long-and short-term self-attention network for sequential recommendation[J]. Neurocomputing 423:580–589 Tang J, Wang K (2018) Personalized top-n sequential recommendation via convolutional sequence embedding[C]. In: Proceedings of the 11th ACM International Conference on Web Search and Data Mining, pp 565–573 Li J, Wang Y, McAuley J (2020) Time interval aware self-attention for sequential recommendation[C]. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp 322–330 Zhang Y, Shi Z, Zuo W, et al. (2020) Joint Personalized Markov Chains with social network embedding for cold-start recommendation[J]. Neurocomputing 386:208–220 Donkers T, Loepp B, Ziegler J (2017) Sequential user-based recurrent neural network recommendations[C]. In: Proceedings of the 11th ACM Conference on Recommender Systems, pp 152–160 Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need[C]. In: Advances in Neural Information Processing Systems, pp 5998–6008 Zhang T, Zhao P, Liu Y, et al. (2019) Feature-level deeper self-attention network for sequential recommendation[C]. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp 4320–4326 Kang W C, McAuley J. (2018) Self-attentive sequential recommendation[C]. In: Proceedings of the 2018 IEEE International Conference on Data Mining, pp 197–206 Zhang S, Tay Y, Yao L, et al. (2019) Next item recommendation with self-attentive metric learning[C]. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, pp 9 Wu L, Li S, Hsieh C J, et al. (2020) SSE-PT: Sequential Recommendation via personalized transformer[C]. In: Proceedings of the 14th ACM Conference on Recommender Systems, pp 328–337 Bhojanapalli S, Yun C, Rawat A S, et al. (2020) Low-rank bottleneck in multi-head attention models[C]. In: Proceedings of the 37th International Conference on Machine Learning, pp 864–873 Wang J, Zhu L, Dai T, et al. (2021) Low-rank and sparse matrix factorization with prior relations for recommender systems[J]. Appl Intell 51(6):3435–3449 Zhang S, Yao L, Sun A, et al. (2019) Deep learning based recommender system: A survey and new perspectives[J]. ACM Comput Surv 52(1):1–38 He X, Liao L, Zhang H, et al. (2017) Neural collaborative filtering[C]. In: Proceedings of the 26th International Conference on World Wide Web, pp 173–182 Nassar N, Jafar A, Rahhal Y (2020) A novel deep multi-criteria collaborative filtering model for recommendation system[J]. Knowl-Based Syst 187:104811 Wu C Y, Ahmed A, Beutel A, et al. (2017) Recurrent recommender networks[C]. In: Proceedings of the 10th ACM International Conference on Web Search and Data Mining, pp 495–503 Chen X, Xu H, Zhang Y, et al. (2018) Sequential recommendation with user memory networks[C]. In: Proceedings of the 11th ACM International Conference on Web Search and Data Mining, pp 108–116 Gehring J, Auli M, Grangier D, et al. (2017) Convolutional sequence to sequence learning[C]. In: Proceedings of the 34th International Conference on Machine Learning, vol 70, pp 1243–1252 Wu C, Wu F, Ge S, et al. (2019) Neural news recommendation with multi-head self-attention[C]. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp 6390–6395 Chen J, Wang C, Shi Q, et al. (2019) Social recommendation based on users’ attention and preference. Neurocomputing 341(5):1–9 Lei K, Fu Q, Yang M, et al. (2020) Tag recommendation by text classification with attention-based capsule network. Neurocomputing 391(5):65–73 Zhang Y, Liu X (2021) Learning attention embeddings based on memory networks for neural collaborative recommendation[J]. Expert Systems with Applications, pp 115439 Kovaleva O, Romanov A, Rogers A, et al. (2019) Revealing the dark secrets of BERT[c]. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 4365–4374 Kingma D P, Ba J. (2015) Adam A method for stochastic optimization[C]. In: Proceedings of the 3rd International Conference on Learning Representations, pp 1–15 He R, McAuley J (2016) Ups and Downs Modeling the visual evolution of fashion trends with one-class collaborative filtering[C]. In: Proceedings of the 25th International Conference on World Wide Web, pp 507–517 Sarwar B M, Karypis G, Konstan J A, et al. (2001) Item-based collaborative filtering recommendation algorithms[C]. In: Proceedings of the 10th International World Wide Web Conference, pp 285–295 Ning X, Karypis G (2011) Slim: Sparse linear methods for top-n recommender systems[C]. In: Proceedings of the 11th International Conference on Data Mining, pp 497–506 Cheng Z, Ding Y, He X, et al. (2018) A3NCF: an adaptive aspect attention model for rating prediction[C]. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp 3748–3754