XRR: Extreme multi-label text classification with candidate retrieving and deep ranking

Information Sciences - Tập 622 - Trang 115-132 - 2023

Jie Xiong¹, Li Yu¹, Xi Niu², Youfang Leng¹

¹School of Information, Renmin University of China, Beijing, BJ 100872, China

²University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC 28223-0001, United States

Tài liệu tham khảo

Gargiulo, 2019, Deep neural network for hierarchical extreme multi-label text classification, Appl. Soft Comput., 79, 125, 10.1016/j.asoc.2019.03.041

Zhang, 2014, A review on multi-label learning algorithms, IEEE Trans. Knowl. Data Eng., 26, 1819, 10.1109/TKDE.2013.39

Y. Prabhu, M. Varma, Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning, in: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014, pp. 263–272.

Huang, 2019, Improving multi-label classification with missing labels by learning label-specific features, Inf. Sci., 492, 124, 10.1016/j.ins.2019.04.021

R. Babbar, B. Schölkopf, Adversarial extreme multi-label classification, arXiv preprint arXiv:1803.01570 (2018).

Xiao, 2022, Semantic guide for semi-supervised few-shot multi-label node classification, Inf. Sci., 591, 235, 10.1016/j.ins.2021.12.130

Hashemi, 2021, An efficient pareto-based feature selection algorithm for multi-label classification, Inf. Sci., 581, 428, 10.1016/j.ins.2021.09.052

Hu, 2022, Feature-specific mutual information variation for multi-label feature selection, Inf. Sci., 593, 449, 10.1016/j.ins.2022.02.024

K. Halder, L. Poddar, M.-Y. Kan, Cold start thread recommendation as extreme multi-label classification, in: Companion Proceedings of the Web Conference 2018, 2018, pp. 1911–1918.

H. Jain, V. Balasubramanian, B. Chunduri, M. Varma, Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches, in: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, 2019, pp. 528–536.

J. Liu, W.-C. Chang, Y. Wu, Y. Yang, Deep learning for extreme multi-label text classification, in: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, 2017, pp. 115–124.

You, 2019, Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification, Adv. Neural Inform. Process. Syst., 32

T. Jiang, D. Wang, L. Sun, H. Yang, Z. Zhao, F. Zhuang, Lightxml: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification, arXiv preprint arXiv:2101.03305 (2021).

Bhatia, 2015, Sparse local embeddings for extreme multi-label classification, Adv. Neural Inform. Process. Syst., 28

Xia, 2021, Multi-label classification with weighted classifier selection and stacked ensemble, Inf. Sci., 557, 421, 10.1016/j.ins.2020.06.017

Liang, 2021, Two-stage three-way enhanced technique for ensemble learning in inclusive policy text classification, Inf. Sci., 547, 271, 10.1016/j.ins.2020.08.051

K. Dahiya, D. Saini, A. Mittal, A. Shaw, K. Dave, A. Soni, H. Jain, S. Agarwal, M. Varma, Deepxml: A deep extreme multi-label learning framework applied to short text documents, in: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021, pp. 31–39.

W.-C. Chang, H.-F. Yu, K. Zhong, Y. Yang, I.S. Dhillon, Taming pretrained transformers for extreme multi-label text classification, in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 3163–3171.

Cover, 1999

A. Mittal, N. Sachdeva, S. Agrawal, S. Agarwal, P. Kar, M. Varma, Eclare: Extreme classification with label graph correlations, in: Proceedings of the Web Conference 2021, 2021, pp. 3721–3732.

D. Saini, A.K. Jain, K. Dave, J. Jiao, A. Singh, R. Zhang, M. Varma, Galaxc: Graph neural networks with labelwise attention for extreme classification, in: Proceedings of the Web Conference 2021, 2021, pp. 3733–3744.

Dahiya, 2021, Siamesexml: Siamese networks meet extreme classifiers with 100m labels, 2330

A. Mittal, K. Dahiya, S. Agrawal, D. Saini, S. Agarwal, P. Kar, M. Varma, Decaf: Deep extreme classification with label features, in: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021, pp. 49–57.

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, 2019, pp. 4171–4186.

J. Gao, D. He, X. Tan, T. Qin, L. Wang, T. Liu, Representation degeneration problem in training natural language generation models, in: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019, 2019.

K. Ethayarajh, How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 55–65.

I.E.-H. Yen, X. Huang, P. Ravikumar, K. Zhong, I. Dhillon, Pd-sparse: A primal and dual sparse approach to extreme multiclass and multilabel classification, in: International conference on machine learning, PMLR, 2016, pp. 3069–3077.

I.E. Yen, X. Huang, W. Dai, P. Ravikumar, I. Dhillon, E. Xing, Ppdsparse: A parallel primal-dual sparse method for extreme classification, in: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 545–553.

R. Babbar, B. Schölkopf, Dismec: Distributed sparse machines for extreme multi-label classification, in: Proceedings of the tenth ACM international conference on web search and data mining, 2017, pp. 721–729.

Y. Prabhu, A. Kag, S. Harsola, R. Agrawal, M. Varma, Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising, in: Proceedings of the 2018 World Wide Web Conference, 2018, pp. 993–1002.

H. Jain, Y. Prabhu, M. Varma, Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 935–944.

Liu, 2015, Mlrf: multi-label classification through random forest with label-set partition, 407

Khandagale, 2020, Bonsai: diverse and shallow trees for extreme multi-label classification, Mach. Learn., 109, 2099, 10.1007/s10994-020-05888-2

Y. Tagami, Annexml: Approximate nearest neighbor search for extreme multi-label classification, in: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017, pp. 455–464.

Su, 2021, Deep low-rank matrix factorization with latent correlation estimation for micro-video multi-label classification, Inf. Sci., 575, 587, 10.1016/j.ins.2021.07.021

Wei, 2019, Does tail label help for large-scale multi-label learning?, IEEE Trans. Neural Networks Learn. Syst., 31, 2315

M. Qaraei, E. Schultheis, P. Gupta, R. Babbar, Convex surrogates for unbiased loss functions in extreme classification with missing labels, in: Proceedings of the Web Conference 2021, 2021, pp. 3711–3720.

T. Wei, W.-W. Tu, Y.-F. Li, G.-P. Yang, Towards robust prediction on tail labels, in: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, p. 1812–1820.

Z. Wang, H. Wang, J.-R. Wen, Y. Xiao, An inference approach to basic level of categorization, in: Proceedings of the 24th acm international on conference on information and knowledge management, 2015, pp. 653–662.

A. Zubiaga, Enhancing navigation on wikipedia with social tags, arXiv preprint arXiv:1202.5469 (2012).

Mikolov, 2013, Distributed representations of words and phrases and their compositionality, Adv. Neural Inform. Process. Syst., 26

Q. Le, T. Mikolov, Distributed representations of sentences and documents, in: International conference on machine learning, PMLR, 2014, pp. 1188–1196.

Wu, 2020, Phrase2vec: phrase embedding based on parsing, Inf. Sci., 517, 100, 10.1016/j.ins.2019.12.031

Kim, 2019, Multi-co-training for document classification using various document representations: Tf–idf, lda, and doc2vec, Inf. Sci., 477, 15, 10.1016/j.ins.2018.10.006

T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, in: Y. Bengio, Y. LeCun (Eds.), 1st International Conference on Learning Representations, ICLR 2013, 2013.

Johnson, 2019, Billion-scale similarity search with gpus, IEEE Trans. Big Data, 7, 535, 10.1109/TBDATA.2019.2921572

Y. Shen, X. He, J. Gao, L. Deng, G. Mesnil, Learning semantic representations using convolutional neural networks for web search, in: Proceedings of the 23rd international conference on world wide web, 2014, pp. 373–374.

X. Yi, J. Yang, L. Hong, D.Z. Cheng, L. Heldt, A. Kumthekar, Z. Zhao, L. Wei, E. Chi, Sampling-bias-corrected neural modeling for large corpus item recommendations, in: Proceedings of the 13th ACM Conference on Recommender Systems, 2019, pp. 269–277.

Khosla, 2020, Supervised contrastive learning, Adv. Neural Inform. Process. Syst., 33, 18661

T. Gao, X. Yao, D. Chen, Simcse: Simple contrastive learning of sentence embeddings, arXiv preprint arXiv:2104.08821 (2021).

J. Su, J. Cao, W. Liu, Y. Ou, Whitening sentence representations for better semantics and faster retrieval, arXiv preprint arXiv:2103.15316 (2021).

B. Li, H. Zhou, J. He, M. Wang, Y. Yang, L. Li, On the sentence embeddings from bert for semantic textual similarity, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 9119–9130.

D. Hendrycks, K. Gimpel, Gaussian error linear units (gelus), arXiv preprint arXiv:1606.08415 (2016).

I. Loshchilov, F. Hutter, Decoupled weight decay regularization, in: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019, 2019.

Loza Mencía, 2008, Efficient pairwise multilabel classification for large-scale problems in the legal domain, 50

J. McAuley, J. Leskovec, Hidden factors and hidden topics: understanding rating dimensions with review text, in: Proceedings of the 7th ACM conference on Recommender systems, 2013, pp. 165–172.

Wydmuch, 2018, A no-regret generalization of hierarchical softmax to extreme multi-label classification, Adv. Neural Inform. Process. Syst., 31, 6358

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA