Factorized weight interaction neural networks for sparse feature prediction

Neural Computing and Applications - Tập 32 - Trang 9567-9579 - 2019
Dafang Zou1, Mengmeng Sheng1, Hui Yu1, Jiafa Mao1, Shengyong Chen1, Weiguo Sheng2
1Zhejiang University of Technology, Hangzhou, China
2> Hangzhou Normal University, Hangzhou, China > > >

Tóm tắt

Non-contiguous and categorical sparse feature data are widely existed on the Internet. To build a machine learning system with these data, it is important to properly model the interaction among features. In this paper, we propose a factorized weight interaction neural network (INN) with a new network structure called weight-interaction layer to learn patterns from feature interactions and factorized weight parameters of each feature interaction. The proposed INN can greatly reduce the dimension of sparse data via the weight-interaction layer, while the multi-layer neural network can be used to capture high-order feature latent patterns. Our experimental results on two real datasets show that the proposed method is able to effectively improve the prediction accuracy and generalization performance of the model, and consistently outperform related methods to be compared.

Tài liệu tham khảo

Baltrunas L, Church K, Karatzoglou A, Oliver N (2015) Frappe: understanding the usage and perception of mobile app recommendations in-the-wild. arXiv preprint arXiv:1505.03014 Bayer I, He X, Kanagal B, Rendle S (2017) A generic coordinate descent framework for learning from implicit feedback. In: Proceedings of the 26th international conference on world wide web, international world wide web conferences steering committee, pp 1341–1350 Chapelle O, Manavoglu E, Rosales R (2015) Simple and scalable response prediction for display advertising. ACM Trans Intell Syst Technol (TIST) 5(4):61 Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 785–794 Cheng HT, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, Anderson G, Corrado G, Chai W, Ispir M, et al (2016) Wide & deep learning for recommender systems. In: Proceedings of the 1st workshop on deep learning for recommender systems. ACM, pp 7–10 Cui Y, Zhang R, Li W, Mao J (2011) Bid landscape forecasting in online ad exchange marketplace. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 265–273 Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(Jul):2121–2159 Graepel T, Candela JQ, Borchert T, Herbrich R (2010) Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine. Omnipress, Madinson Graves A, Mohamed A, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6645–6649 Guo H, Tang R, Ye Y, Li Z, He X (2017) DeepFM: a factorization-machine based neural network for CTR prediction. In: Proceedings of the 26th international joint conference on artificial intelligence. AAAI Press, pp 1725–1731 Hand DJ, Yu K (2001) Idiot’s bayes—Not so stupid after all? Int Stat Rev 69(3):385–398 Harper FM, Konstan JA (2016) The movielens datasets: history and context. ACM Trans Interact Intell Syst (tiis) 5(4):19 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 He X, Chua TS (2017) Neural factorization machines for sparse predictive analytics. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 355–364 He X, Pan J, Jin O, Xu T, Liu B, Xu T, Shi Y, Atallah A, Herbrich R, Bowers S, et al (2014) Practical lessons from predicting clicks on ads at Facebook. In: Proceedings of the eighth international workshop on data mining for online advertising. ACM, pp 1–9 He X, Liao L, Zhang H, Nie L, Hu X, Chua TS (2017) Neural collaborative filtering. In: Proceedings of the 26th international conference on world wide web, international world wide web conferences steering committee, pp 173–182 Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708 Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 Juan Y, Zhuang Y, Chin WS, Lin CJ (2016) Field-aware factorization machines for CTR prediction. In: Proceedings of the 10th ACM conference on recommender systems. ACM, pp 43–50 Juan Y, Lefortier D, Chapelle O (2017) Field-aware factorization machines in a real-world online advertising system. In: Proceedings of the 26th international conference on world wide web companion, international world wide web conferences steering committee, pp 680–688 Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: a highly efficient gradient boosting decision tree. In: Advances in neural information processing systems, pp 3146–3154 Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105 McMahan HB, Holt G, Sculley D, Young M, Ebner D, Grady J, Nie L, Phillips T, Davydov E, Golovin D, et al (2013) Ad click prediction: a view from the trenches. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1222–1230 Menon AK, Chitrapura KP, Garg S, Agarwal D, Kota N (2011) Response prediction using collaborative filtering with hierarchies and side-information. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 141–149 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119 Pan J, Xu J, Ruiz AL, Zhao W, Pan S, Sun Y, Lu Q (2018) Field-weighted factorization machines for click-through rate prediction in display advertising. In: Proceedings of the 2018 world wide web conference on world wide web, international world wide web conferences steering committee, pp 1349–1357 Punjabi S, Bhatt P (2018) Robust factorization machines for user response prediction. In: Proceedings of the 2018 world wide web conference on world wide web, international world wide web conferences steering committee, pp 669–678 Qu Y, Cai H, Ren K, Zhang W, Yu Y, Wen Y, Wang J (2016) Product-based neural networks for user response prediction. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 1149–1154 Qu Y, Fang B, Zhang W, Tang R, Niu M, Guo H, Yu Y, He X (2018) Product-based neural networks for user response prediction over multi-field categorical data. ACM Trans Inf Syst (TOIS) 37(1):5 Rendle S (2010) Factorization machines. In: 2010 IEEE 10th international conference on data mining (ICDM). IEEE, pp 995–1000 Rendle S (2012) Factorization machines with libfm. ACM Trans Intell Syst Technol (TIST) 3(3):57 Rendle S, Gantner Z, Freudenthaler C, Schmidt-Thieme L (2011) Fast context-aware recommendations with factorization machines. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 635–644 Reshma R, Sowmya V, Soman K (2018) Effect of Lgendre–Fenchel denoising and SVD-based dimensionality reduction algorithm on hyperspectral image classification. Neural Comput Appl 29(8):301–310 Richardson M, Dominowska E, Ragno R (2007) Predicting clicks: estimating the click-through rate for new ads. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 521–530 Shan Y, Hoens TR, Jiao J, Wang H, Yu D, Mao J (2016) Deep crossing: web-scale modeling without manually crafted combinatorial features. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 255–262 Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958 Ta AP (2015) Factorization machines with follow-the-regularized-leader for CTR prediction in display advertising. In: 2015 IEEE international conference on big data (big data). IEEE, pp 2889–2891 Xiao J, Ye H, He X, Zhang H, Wu F, Chua TS (2017) Attentional factorization machines: learning the weight of feature interactions via attention networks. arXiv preprint arXiv:1708.04617 Zhang H, Cao X, Ho JKL, Chow TWS (2017) Object-level video advertising: an optimization framework. IEEE Trans Ind Inf 13(2):520–531. https://doi.org/10.1109/TII.2016.2605629 Zhang H, Ji Y, Huang W, Liu L (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 1–20 Zhang W, Du T, Wang J (2016) Deep learning over multi-field categorical data. In: European conference on information retrieval. Springer, pp 45–57 Zhou G, Song C, Zhu X, Fan Y, Zhu H, Ma X, Yan Y, Jin J, Li H, Gai K (2017) Deep interest network for click-through rate prediction. arXiv preprint arXiv:1706.06978