Bonsai: diverse and shallow trees for extreme multi-label classification

Machine Learning - Tập 109 - Trang 2099-2119 - 2020
Sujay Khandagale1, Han Xiao1, Rohit Babbar1
1Aalto University, Helsinki, Finland

Tóm tắt

Extreme multi-label classification (XMC) refers to supervised multi-label learning involving hundreds of thousands or even millions of labels. In this paper, we develop a suite of algorithms, called Bonsai, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees. We show three concrete realizations of this label representation space including: (i) the input space which is spanned by the input features, (ii) the output space spanned by label vectors based on their co-occurrence with other labels, and (iii) the joint space by combining the input and output representations. Furthermore, the constraint-free multi-way partitions learnt iteratively in these spaces lead to shallow trees. By combining the effect of shallow trees and generalized label representation, Bonsai achieves the best of both worlds—fast training which is comparable to state-of-the-art tree-based methods in XMC, and much better prediction accuracy, particularly on tail-labels. On a benchmark Amazon-3M dataset with 3 million labels, Bonsai outperforms a state-of-the-art one-vs-rest method in terms of prediction accuracy, while being approximately 200 times faster to train. The code for Bonsai is available at https://github.com/xmc-aalto/bonsai .

Tài liệu tham khảo

Agrawal, R., Gupta, A., Prabhu, Y., & Varma, M. (2013). Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In World Wide Web conference. Babbar, R., & Schölkopf, B. (2017). Dismec: Distributed sparse machines for extreme multi-label classification. In International conference on web search and data mining (pp. 721–729). Babbar, R., & Schölkopf, B. (2019). Data scarcity, robustness and extreme multi-label classification. Machine Learning, 108(8–9), 1329–1351. Babbar, R., Partalas, I., Gaussier, E., & Amini, M.R. (2013). On flat versus hierarchical classification in large-scale taxonomies. In Advances in neural information processing systems (pp. 1824–1832). Babbar, R., Metzig, C., Partalas, I., Gaussier, E., & Amini, M.R. (2014). On power law distributions in large-scale taxonomies. In ACM SIGKDD explorations newsletter (pp. 47–56). Babbar, R., Partalas, I., Gaussier, E., Amini, M. R., & Amblard, C. (2016). Learning taxonomy adaptation in large-scale classification. The Journal of Machine Learning Research, 17(1), 3350–3386. Bengio, S., Weston, J., & Grangier, D. (2010). Label embedding trees for large multi-class tasks. In Neural information processing systems (pp. 163–171). Bhatia, K., Jain, H., Kar, P., Varma, M., & Jain, P. (2015). Sparse local embeddings for extreme multi-label classification. In Neural information processing systems. Bhatia, K., Dahiya, K., Jain, H., Prabhu, Y., & Varma, M. (2016). The extreme classification repository: Multi-label datasets and code. http://manikvarma.org/downloads/XC/XMLRepository.html. Deng, J., Berg, A.C., Li, K., & Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us? In European conference on computer vision. Denton, E., Weston, J., Paluri, M., Bourdev, L., Fergus, R. (2015). User conditional hashtag prediction for images. In ACM SIGKDD international conference on knowledge discovery and data mining. Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). Liblinear: A library for large linear classification. Journal of Machine Learning Research, 9(Aug), 1871–1874. Fang, H., Cheng, M., Hsieh, C.J., & Friedlander, M. (2019) Fast training for large-scale one-versus-all linear classifiers using tree-structured initialization. In Proceedings of the 2019 SIAM international conference on data mining, SIAM (pp. 280–288). Hsu, D., Kakade, S., Langford, J., & Zhang, T. (2009). Multi-label prediction via compressed sensing. In Advances in neural information processing systems. Jain, H., Prabhu, Y., & Varma, M. (2016). Extreme multi-label loss functions for recommendation, tagging, ranking and other missing label applications. In ACM SIGKDD international conference on knowledge discovery and data mining. Jalan, A., & Kar, P. (2019). Accelerating extreme classification via adaptive feature agglomeration. arXiv preprint arXiv:190511769. Jasinska, K., Dembczynski, K., Busa-Fekete, R., Pfannschmidt, K., Klerx, T., & Hüllermeier, E. (2016). Extreme f-measure maximization using sparse probability estimates. In International conference on machine learning. Joly, A., Wehenkel, L., & Geurts, P. (2019). Gradient tree boosting with random output projections for multi-label classification and multi-output regression. arXiv preprint arXiv:190507558. Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2017). Bag of tricks for efficient text classification. In Proceedings of the 15th conference of the European chapter of the association for computational linguistics (Vol. 2, pp. 427–431), Short Papers. Khandagale, S., & Babbar, R. (2019) A simple and effective scheme for data pre-processing in extreme classification. In 27th European symposium on artificial neural networks, ESANN 2019, Bruges, Belgium, 24–26 April 2019. Kim, Y. (2014). Convolutional neural networks for sentence classification. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1746–1751). Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In Neural information processing systems (pp. 1097–1105). Liang, Y., Hsieh, C.J., & Lee, T. (2018). Block-wise partitioning for extreme multi-label classification. arXiv preprint arXiv:181101305. Lin, Z., Ding, G., Hu, M., & Wang, J. (2014). Multi-label classification via feature-aware implicit label space encoding. In International conference on machine learning (pp. 325–333). Liu, J., Chang, W.C., Wu, Y., & Yang, Y. (2017). Deep learning for extreme multi-label text classification. In SIGIR, ACM (pp. 115–124). Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129–137. Madjarov, G., Kocev, D., Gjorgjevikj, D., & Džeroski, S. (2012). An extensive experimental comparison of methods for multi-label learning. Pattern Recognition, 45(9), 3084–3104. Majzoubi, M., & Choromanska, A. (2019). Ldsm: Logarithm-depth streaming multi-label decision trees. arXiv preprint arXiv:190510428. McAuley, J., & Leskovec, J. (2013). Hidden factors and hidden topics: Understanding rating dimensions with review text. In RecSys, ACM (pp. 165–172). Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Neural information processing systems (pp. 3111–3119). Partalas, I., Kosmopoulos, A., Baskiotis, N., Artieres, T., Paliouras, G., Gaussier, E., Androutsopoulos, I., Amini, M.R., & Galinari, P, (2015). Lshtc: A benchmark for large-scale text classification. arXiv preprint arXiv:150308581. Prabhu, Y., & Varma, M. (2014). Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In ACM SIGKDD international conference on knowledge discovery and data mining, ACM (pp. 263–272). Prabhu, Y., Kag, A., Harsola, S., Agrawal, R., & Varma, M. (2018). Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In Proceedings of the 2018 World Wide Web conference on World Wide Web (pp. 993–1002). Read, J., Pfahringer, B., & Holmes, G. (2008). Multi-label classification using ensembles of pruned sets. In 8th IEEE international conference on data mining, IEEE (pp. 995–1000). Shen, D., Ruvini, J.D., Somaiya, M., & Sundaresan, N. (2011). Item categorization in the e-commerce domain. In Proceedings of the 20th ACM international conference on information and knowledge management, ACM (pp. 1921–1924). Si, S., Zhang, H., Keerthi, S.S., Mahajan, D., Dhillon, I.S., & Hsieh, C.J. (2017). Gradient boosted decision trees for high dimensional sparse output. In International conference on machine learning. Tagami, Y. (2017). Annexml: Approximate nearest neighbor search for extreme multi-label classification. In ACM SIGKDD international conference on knowledge discovery and data mining, ACM, IEEE. Tai, F., & Lin, H. T. (2012). Multilabel classification with principal label space transformation. Neural Computation, 24(9), 2508–2542. Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM), 3(3), 1–13. Tsoumakas, G., Katakis, I., & Vlahavas, I. (2008). Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of the ECML/PKDD 2008 workshop on mining multidimensional data (MMD08). Vens, C., Struyf, J., Schietgat, L., Džeroski, S., & Blockeel, H. (2008). Decision trees for hierarchical multi-label classification. Machine Learning, 73(2), 185. Wei, T., & Li, Y.F. (2018). Does tail label help for large-scale multi-label learning. In Proceedings of the 27th international joint conference on artificial intelligence (pp. 2847–2853), AAAI Press. Weston, J., Bengio, S., & Usunier, N. (2011). Wsabie: Scaling up to large vocabulary image annotation. In IJCAI. Wydmuch, M., Jasinska, K., Kuznetsov, M., Busa-Fekete, R., & Dembczynski, K. (2018). A no-regret generalization of hierarchical softmax to extreme multi-label classification. In Advances in neural information processing systems (pp. 6355–6366). Xu, C., Tao, D., & Xu, C. (2016a). Robust extreme multi-label learning. In Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining. Xu, C., Tao, D., & Xu, C. (2016b). Robust extreme multi-label learning. In ACM SIGKDD international conference on knowledge discovery and data mining, ACM (pp. 1275–1284). Yen, I.E., Huang, X., Dai, W., Ravikumar, P., Dhillon, I., & Xing, E. (2017). Ppdsparse: A parallel primal-dual sparse method for extreme classification. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, ACM (pp. 545–553). Yen, I.E.H., Huang, X., Ravikumar, P., Zhong, K., & Dhillon, I. (2016). Pd-sparse: A primal and dual sparse approach to extreme multiclass and multilabel classification. In International conference on machine learning (pp. 3069–3077). Yu, H.F., Jain, P., Kar, P., & Dhillon, I. (2014) Large-scale multi-label learning with missing labels. In International conference on machine learning (pp. 593–601). Zhang, M. L., Li, Y. K., Liu, X. Y., & Geng, X. (2018). Binary relevance for multi-label learning: An overview. Frontiers of Computer Science, 12(2), 191–202.