Comprehensive analysis of embeddings and pre-training in NLP

Computer Science Review - Tập 42 - Trang 100433 - 2021
Jatin Karthik Tripathy1, Sibi Chakkaravarthy Sethuraman1, Meenalosini Vimal Cruz2, Anupama Namburu1, Mangalraj P.1, Nandha Kumar R.1, Sudhakar Ilango S1, Vaidehi Vijayakumar3
1School of Computer Science and Engineering, VIT-AP University, Andhra Pradesh, India
2Department of Information Technology, Georgia Southern University, GA, USA
3Mother Teresa Women’s University, Kodaikanal, Tamilnadu, India

Tài liệu tham khảo

Hinton, 2012, Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Process. Mag., 29, 82, 10.1109/MSP.2012.2205597 Dahl, 2011, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, IEEE Trans. Audio Speech Lang. Process., 20, 30, 10.1109/TASL.2011.2134090 J. Dean, G.S. Corrado, R. Monga, K. Chen, M. Devin, Q.V. Le, M.Z. Mao, M. Ranzato, A. Senior, P. Tucker, et al. Large scale distributed deep networks, in: Proceedings of the 25th International Conference on Neural Information Processing Systems-Volume 1, 2012, pp. 1223–1231. Krizhevsky, 2012, Imagenet classification with deep convolutional neural networks, Adv. Neural Inf. Process. Syst., 25, 1097 LeCun, 1998, Gradient-based learning applied to document recognition, Proc. IEEE, 86, 2278, 10.1109/5.726791 Senior, 2020, Improved protein structure prediction using potentials from deep learning, Nature, 577, 706, 10.1038/s41586-019-1923-7 Agarap, 2018 Lu, 2018 Huang, 2015 Gregor, 2015, Draw: A recurrent neural network for image generation, 1462 Graves, 2005, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Netw., 18, 602, 10.1016/j.neunet.2005.06.042 Bahdanau, 2014 Sutskever, 2014, Sequence to sequence learning with neural networks, 3104 Cho, 2014 Luong, 2015 Vaswani, 2017, Attention is all you need, 5998 K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778. Ba, 2016 Pan, 2009, A survey on transfer learning, IEEE Trans. Knowl. Data Eng., 22, 1345, 10.1109/TKDE.2009.191 Simonyan, 2014 Mikolov, 2013 Mikolov, 2013, Distributed representations of words and phrases and their compositionality, 3111 M. Baroni, G. Dinu, G. Kruszewski, Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors, in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014, pp. 238–247. J. Pennington, R. Socher, C.D. Manning, Glove: Global vectors for word representation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532–1543. Deerwester, 1990, Indexing by latent semantic analysis, J. Am. Soc. Inf. Sci., 41, 391, 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9 McCann, 2017 Peters, 2018 Weaver, 1949, Translation Moro, 2014, Entity linking meets word sense disambiguation: A unified approach, Trans. Assoc. Comput. Linguist., 2, 231, 10.1162/tacl_a_00179 Jawahar, 2018, ELMoLex: Connecting ELMo and lexicon features for dependency parsing, 1 Hochreiter, 1991, Untersuchungen zu dynamischen neuronalen netzen, Diploma Tech. Univ. München, 91 Hochreiter, 2001 Radford, 2018 Y. Zhu, R. Kiros, R. Zemel, R. Salakhutdinov, R. Urtasun, A. Torralba, S. Fidler, Aligning books and movies: Towards story-like visual explanations by watching movies and reading books, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 19–27. K. Papineni, S. Roukos, T. Ward, J. Henderson, F. Reeder, Corpus-based comprehensive and diagnostic MT evaluation: initial Arabic, Chinese, French, and Spanish results, in: Proceedings of the Second International Conference on Human Language Technology Research, 2002, pp. 132–137. Liu, 2018 Rocktäschel, 2015 Radford, 2019, Language models are unsupervised multitask learners, OpenAI Blog, 1, 9 Zhu, 2018 Alberti, 2019 C. Qu, L. Yang, M. Qiu, W.B. Croft, Y. Zhang, M. Iyyer, BERT with history answer embedding for conversational question answering, in: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019, pp. 1133–1136. Liu, 2019 Liu, 2019 Zhang, 2019 Sennrich, 2015 Brown, 2020 Kaplan, 2020 Devlin, 2018 Taylor, 1953, “Cloze procedure”: A new tool for measuring readability, J. Q., 30, 415 Wu, 2016 Liu, 2019 Trinh, 2018 S. Nagel, URL http://web.archive.org/save/http://commoncrawl.org/2016/10/newsdataset. A. Gokaslan, V. Cohen, URL http://web.archive.org/save/http://Skylion007.github.io/OpenWebTextCorpus. Reimers, 2019 Sanh, 2019 C. Buciluǎ, R. Caruana, A. Niculescu-Mizil, Model compression, in: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 535–541. Hinton, 2015 Lan, 2019 Hou, 2020 L. Yang, M. Zhang, C. Li, M. Bendersky, M. Najork, Beyond 512 tokens: Siamese multi-depth transformer-based hierarchical encoder for long-form document matching, in: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 1725–1734. Fedus, 2021 He, 2021 Yang, 2019, Xlnet: Generalized autoregressive pretraining for language understanding, Adv. Neural Inf. Process. Syst., 32 Raffel, 2020 Clark, 2020 Lee-Thorp, 2021 K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: A method for automatic evaluation of machine translation, in: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002, pp. 311–318. C. Callison-Burch, M. Osborne, P. Koehn, Re-evaluating the role of BLEU in machine translation research, in: 11th Conference of the European Chapter of the Association for Computational Linguistics, 2006, pp. 249–256. Rajpurkar, 2016 Lai, 2017 Zellers, 2018 Wang, 2018 McCann, 2018 Bengio, 1994, Learning long-term dependencies with gradient descent is difficult, IEEE Trans. Neural Netw., 5, 157, 10.1109/72.279181 Tang, 2016, Sequence-to-sequence model with attention for time series classification, 503 Harmon, 2018 Chiu, 2018, State-of-the-art speech recognition with sequence-to-sequence models, 4774 Zhou, 2018, A comparison of modeling units in sequence-to-sequence speech recognition with the transformer on mandarin chinese, 210 Mangal, 2019 Kotecha, 2018 F., 2020