A review on the attention mechanism of deep learning

Neurocomputing - Tập 452 - Trang 48-62 - 2021
Zhaoyang Niu1, Guoqiang Zhong1, Hui Yu2
1Department of Computer Science and Technology, Ocean University of China, Qingdao 266100, China
2School of Creative Technologies, University of Portsmouth, Portsmouth PO1 2DJ, UK

Tóm tắt

Từ khóa


Tài liệu tham khảo

Rensink, 2000, The dynamic representation of scenes, Visual Cogn., 7, 17, 10.1080/135062800394667

Corbetta, 2002, Control of goal-directed and stimulus-driven attention in the brain, Nat. Rev. Neurosci., 3, 201, 10.1038/nrn755

Tsotsos, 1995, Modeling visual attention via selective tuning, Artif. Intell., 78, 507, 10.1016/0004-3702(95)00025-9

Hochreiter, 1997, Long short-term memory, Neural Comput., 9, 1735, 10.1162/neco.1997.9.8.1735

K. Cho, B. van Merrienboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, in: EMNLP, ACL, 2014, pp. 1724–1734.

Itti, 1998, A model of saliency-based visual attention for rapid scene analysis, IEEE Trans. Pattern Anal. Mach. Intell., 20, 1254, 10.1109/34.730558

V. Mnih, N. Heess, A. Graves, K. Kavukcuoglu, Recurrent models of visual attention, in: NIPS, pp. 2204–2212.

D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate, in: ICLR.

K. Xu, J. Ba, R. Kiros, K. Cho, A.C. Courville, R. Salakhutdinov, R.S. Zemel, Y. Bengio, Show, attend and tell: neural image caption generation with visual attention, in: ICML, Volume 37 of JMLR Workshop and Conference Proceedings, JMLR.org, 2015, pp. 2048–2057.

J. Lu, C. Xiong, D. Parikh, R. Socher, Knowing when to look: Adaptive attention via a visual sentinel for image captioning, in: CVPR, IEEE Computer Society, 2017, pp. 3242–3250

Liu, 2019, Bidirectional LSTM with attention mechanism and convolutional layer for text classification, Neurocomputing, 337, 325, 10.1016/j.neucom.2019.01.078

Li, 2019, Improving user attribute classification with text and social network attention, Cogn. Comput., 11, 459, 10.1007/s12559-019-9624-y

I. Sutskever, O. Vinyals, Q.V. Le, Sequence to sequence learning with neural networks, in: NIPS, pp. 3104–3112.

T. Luong, H. Pham, C.D. Manning, Effective approaches to attention-based neural machine translation, in: EMNLP, The Association for Computational Linguistics, 2015, pp. 1412–1421.

D. Britz, A. Goldie, M. Luong, Q.V. Le, Massive exploration of neural machine translation architectures, CoRR abs/1703.03906 (2017)

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, in: NIPS, pp. 5998–6008.

S. Song, C. Lan, J. Xing, W. Zeng, J. Liu, An end-to-end spatio-temporal attention model for human action recognition from skeleton data, in: AAAI, AAAI Press, 2017, pp. 4263–4270.

Tian, 2019, Densely connected attentional pyramid residual network for human pose estimation, Neurocomputing, 347, 13, 10.1016/j.neucom.2019.01.104

A. Zhao, L. Qi, J. Li, J. Dong, H. Yu, LSTM for diagnosis of neurodegenerative diseases using gait data, in: H. Yu, J. Dong (Eds.), Ninth International Conference on Graphic and Image Processing (ICGIP 2017), vol. 10615, p. 106155B.

P. Zhang, J. Xue, C. Lan, W. Zeng, Z. Gao, N. Zheng, Adding attentiveness to the neurons in recurrent neural networks, in: ECCV (9), Volume 11213 of Lecture Notes in Computer Science, Springer, 2018, pp. 136–152.

Song, 2018, Boosting image sentiment analysis with visual attention, Neurocomputing, 312, 218, 10.1016/j.neucom.2018.05.104

Yan, 2021, Deep multi-view learning methods: a review, Neurocomputing, 10.1016/j.neucom.2021.03.090

J. Chorowski, D. Bahdanau, K. Cho, Y. Bengio, End-to-end continuous speech recognition using attention-based recurrent NN: first results, CoRR abs/1412.1602 (2014).

W. Chan, N. Jaitly, Q. V. Le, O. Vinyals, Listen, attend and spell: a neural network for large vocabulary conversational speech recognition, in: ICASSP, IEEE, 2016, pp. 4960–4964

M. Sperber, J. Niehues, G. Neubig, S. Stüker, A. Waibel, Self-attentional acoustic models, in: INTERSPEECH, ISCA, 2018, pp. 3723–3727.

S. Wang, L. Hu, L. Cao, X. Huang, D. Lian, W. Liu, Attention-based transactional context embedding for next-item recommendation, in: AAAI, AAAI Press, 2018, pp. 2532–2539

H. Ying, F. Zhuang, F. Zhang, Y. Liu, G. Xu, X. Xie, H. Xiong, J. Wu, Sequential recommender system based on hierarchical attention networks, in: IJCAI, ijcai.org, 2018, pp. 3926–3932.

P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, Y. Bengio, Graph attention networks, in: ICLR (Poster), OpenReview.net, 2018.

K. Xu, L. Wu, Z. Wang, Y. Feng, V. Sheinin, Graph2seq: graph to sequence learning with attention-based neural networks, CoRR abs/1804.00823 (2018).

Z. Lin, M. Feng, C.N. dos Santos, M. Yu, B. Xiang, B. Zhou, Y. Bengio, A structured self-attentive sentence embedding, in: ICLR (Poster), OpenReview.net, 2017.

K. Zhang, G. Zhong, J. Dong, S. Wang, Y. Wang, Stock market prediction based on generative adversarial network, in: R. Bie, Y. Sun, J. Yu (Eds.), 2018 International Conference on Identification, Information and Knowledge in the Internet of Things, IIKI 2018, Beijing, China, October 19–21, 2018, Volume 147 of Procedia Computer Science, Elsevier, 2018, pp. 400–406.

Ieracitano, 2021, A novel automatic classification system based on hybrid unsupervised and supervised machine learning for electrospun nanofibers, IEEE CAA J. Autom. Sinica, 8, 64, 10.1109/JAS.2020.1003387

Z. Fan, G. Zhong, H. Li, A feature fusion network for multi-modal mesoscale eddy detection, in: H. Yang, K. Pasupa, A.C. Leung, J.T. Kwok, J.H. Chan, I. King (Eds.), Neural Information Processing – 27th International Conference, ICONIP 2020, Bangkok, Thailand, November 23–27, 2020, Proceedings, Part I, volume 12532 of Lecture Notes in Computer Science, Springer, 2020, pp. 51–61.

Yu, 2015, A framework for automatic and perceptually valid facial expression generation, Multimedia Tools Appl., 74, 9427, 10.1007/s11042-014-2125-9

Q. Li, Z. Fan, G. Zhong, Bednet: bi-directional edge detection network for ocean front detection, in: H. Yang, K. Pasupa, A.C. Leung, J.T. Kwok, J.H. Chan, I. King (Eds.), Neural Information Processing – 27th International Conference, ICONIP 2020, Bangkok, Thailand, November 18–22, 2020, Proceedings, Part IV, volume 1332 of Communications in Computer and Information Science, Springer, 2020, pp. 312–319.

Z. Fan, G. Zhong, H. Wei, H. Li, Ednet: a mesoscale eddy detection network with multi-modal data, in: 2020 International Joint Conference on Neural Networks, IJCNN 2020, Glasgow, United Kingdom, July 19–24, 2020, IEEE, 2020, pp. 1–7.

Liu, 2020, Region based parallel hierarchy convolutional neural network for automatic facial nerve paralysis evaluation, IEEE Trans. Neural Syst. Rehab. Eng., 28, 2325, 10.1109/TNSRE.2020.3021410

Yue, 2021, An optimally weighted user- and item-based collaborative filtering approach to predicting baseline data for friedreich’s ataxia patients, Neurocomputing, 419, 287, 10.1016/j.neucom.2020.08.031

Zeng, 2020, Deep-reinforcement-learning-based images segmentation for quantitative analysis of gold immunochromatographic strip, Neurocomputing

Liu, 2019, A novel particle swarm optimization approach for patient clustering from emergency departments, IEEE Trans. Evol. Comput., 23, 632, 10.1109/TEVC.2018.2878536

Zeng, 2019, An improved particle filter with a novel hybrid proposal distribution for quantitative analysis of gold immunochromatographic strips, IEEE Trans. Nanotechnol., 18, 819, 10.1109/TNANO.2019.2932271

Ming, 2021, Deep learning for monocular depth estimation: a review, Neurocomputing, 438, 14, 10.1016/j.neucom.2020.12.089

Xia, 2019, Accurate and robust eye center localization via fully convolutional networks, IEEE CAA J. Autom. Sinica, 6, 1127, 10.1109/JAS.2019.1911684

Guo, 2020, Real-time facial affective computing on mobile devices, Sensors, 20, 870, 10.3390/s20030870

Wang, 2021, Cascade regression-based face frontalization for dynamic facial expression analysis, Cogn. Comput., 1

Zhang, 2020, Scene perception guided crowd anomaly detection, Neurocomputing, 414, 291, 10.1016/j.neucom.2020.07.019

Roy, 2021, Discriminative dictionary design for action classification in still images and videos, Cogn. Comput., 10.1007/s12559-021-09851-8

Liu, 2021, Deep learning in sheet metal bending with a novel theory-guided deep neural network, IEEE/CAA J. Autom. Sinica, 8, 565, 10.1109/JAS.2021.1003871

Luque Snchez, 2020, Revisiting crowd behaviour analysis through deep learning: taxonomy, anomaly detection, crowd emotions, datasets, opportunities and prospects, Inf. Fusion, 64, 318, 10.1016/j.inffus.2020.07.008

Zhang, 2021, Crowd emotion evaluation based on fuzzy inference of arousal and valence, Neurocomputing, 445, 194, 10.1016/j.neucom.2021.02.047

R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, D. Pedreschi, A survey of methods for explaining black box models, ACM Comput. Surv. 51 (2019) 93:1–93:42.

S. Jain, B.C. Wallace, Attention is not explanation, in: NAACL-HLT (1), Association for Computational Linguistics, 2019, pp. 3543–3556.

S. Serrano, N.A. Smith, Is attention interpretable?, in: ACL (1), Association for Computational Linguistics, 2019, pp. 2931–2951.

L.H. Li, M. Yatskar, D. Yin, C. Hsieh, K. Chang, What does BERT with vision look at?, in: ACL, Association for Computational Linguistics, 2020, pp. 5265–5275.

G. Letarte, F. Paradis, P. Giguère, F. Laviolette, Importance of self-attention for sentiment analysis, in: BlackboxNLP@EMNLP, Association for Computational Linguistics, 2018, pp. 267–275.

S. Vashishth, S. Upadhyay, G.S. Tomar, M. Faruqui, Attention interpretability across NLP tasks, CoRR abs/1909.11218 (2019)

S. Wiegreffe, Y. Pinter, Attention is not not explanation, in: EMNLP/IJCNLP (1), Association for Computational Linguistics, 2019, pp. 11–20.

Schuster, 1997, Bidirectional recurrent neural networks, IEEE Trans. Signal Process., 45, 2673, 10.1109/78.650093

A. Sordoni, P. Bachman, Y. Bengio, Iterative alternating neural attention for machine reading, CoRR abs/1606.02245 (2016).

A. Graves, G. Wayne, I. Danihelka, Neural turing machines, CoRR abs/1410.5401 (2014)

S. Zhao, Z. Zhang, Attention-via-attention neural machine translation, in: AAAI, AAAI Press, 2018, pp. 563–570.

A. Galassi, M. Lippi, P. Torroni, Attention, please! A critical review of neural attention models in natural language processing, CoRR abs/1902.02181 (2019)

Z. Yang, D. Yang, C. Dyer, X. He, A.J. Smola, E.H. Hovy, Hierarchical attention networks for document classification, in: HLT-NAACL, The Association for Computational Linguistics, 2016, pp. 1480–1489.

A.F.T. Martins, R.F. Astudillo, From softmax to sparsemax: A sparse model of attention and multi-label classification, in: ICML, Volume 48 of JMLR Workshop and Conference Proceedings, JMLR.org, 2016, pp. 1614–1623.

Y. Kim, C. Denton, L. Hoang, A.M. Rush, Structured attention networks, arXiv: Computation and Language (2017).

A.H. Miller, A. Fisch, J. Dodge, A. Karimi, A. Bordes, J. Weston, Key-value memory networks for directly reading documents, in: EMNLP, The Association for Computational Linguistics, 2016, pp. 1400–1409.

J. Ba, G.E. Hinton, V. Mnih, J.Z. Leibo, C. Ionescu, Using fast weights to attend to the recent past, in: NIPS, pp. 4331–4339.

Ç. Gülçehre, S. Chandar, K. Cho, Y. Bengio, Dynamic neural turing machine with soft and hard addressing schemes, CoRR abs/1607.00036 (2016)

M. Daniluk, T. Rocktäschel, J. Welbl, S. Riedel, Frustratingly short attention spans in neural language modeling, in: ICLR (Poster), OpenReview.net, 2017.

Williams, 1992, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn., 8, 229, 10.1007/BF00992696

Hu, 2020, Squeeze-and-excitation networks, IEEE Trans. Pattern Anal. Mach. Intell., 42, 2011, 10.1109/TPAMI.2019.2913372

J. Ba, V. Mnih, K. Kavukcuoglu, Multiple object recognition with visual attention, in: Y. Bengio, Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings.

M. Jaderberg, K. Simonyan, A. Zisserman, et al., Spatial transformer networks, in: Advances in Neural Information Processing Systems, pp. 2017–2025.

S. Chaudhari, G. Polatkan, R. Ramanath, V. Mithal, An attentive survey of attention models, CoRR abs/1904.02874 (2019).

J. Lu, J. Yang, D. Batra, D. Parikh, Hierarchical question-image co-attention for visual question answering, in: NIPS, pp. 289–297.

F. Fan, Y. Feng, D. Zhao, Multi-grained attention network for aspect-level sentiment classification, in: EMNLP, Association for Computational Linguistics, 2018, pp. 3433–3442.

W. Wang, S. J. Pan, D. Dahlmeier, X. Xiao, Coupled multi-layer attentions for co-extraction of aspect and opinion terms, in: AAAI, AAAI Press, 2017, pp. 3316–3322

Y. Tay, A.T. Luu, S.C. Hui, Hermitian co-attention networks for text matching in asymmetrical domains, in: IJCAI, ijcai.org, 2018, pp. 4425–4431.

Q. Zhang, J. Fu, X. Liu, X. Huang, Adaptive co-attention network for named entity recognition in tweets, in: AAAI, AAAI Press, 2018, pp. 5674–5681

F. Nie, Y. Cao, J. Wang, C. Lin, R. Pan, Mention and entity description co-attention for entity disambiguation, in: AAAI, AAAI Press, 2018, pp. 5908–5915

X. Li, K. Song, S. Feng, D. Wang, Y. Zhang, A co-attention neural network model for emotion cause analysis with emotional context awareness, in: EMNLP, Association for Computational Linguistics, 2018, pp. 4752–4757.

Y. Tay, A.T. Luu, S.C. Hui, J. Su, Attentive gated lexicon reader with contrastive contextual co-attention for sentiment classification, in: EMNLP, Association for Computational Linguistics, 2018, pp. 3443–3453.

B. Wang, K. Liu, J. Zhao, Inner attention based recurrent neural networks for answer selection, in: ACL (1), The Association for Computer Linguistics, 2016.

L. Wu, F. Tian, L. Zhao, J. Lai, T. Liu, Word attention for sequence to sequence text understanding, in: AAAI, AAAI Press, 2018, pp. 5578–5585

J. Pavlopoulos, P. Malakasiotis, I. Androutsopoulos, Deeper attention to abusive user content moderation, in: EMNLP, Association for Computational Linguistics, 2017, pp. 1125–1135.

Z. Li, Y. Wei, Y. Zhang, Q. Yang, Hierarchical attention transfer network for cross-domain sentiment classification, in: AAAI, AAAI Press, 2018, pp. 5852–5859

X. Wang, R. B. Girshick, A. Gupta, K. He, Non-local neural networks, in: CVPR, IEEE Computer Society, 2018, pp. 7794–7803.

C. Wu, F. Wu, J. Liu, Y. Huang, Hierarchical user and item representation with three-tier attention for recommendation, in: NAACL-HLT (1), Association for Computational Linguistics, 2019, pp. 1818–1826.

T. Xiao, Y. Xu, K. Yang, J. Zhang, Y. Peng, Z. Zhang, The application of two-level attention models in deep convolutional neural network for fine-grained image classification, in: CVPR, IEEE Computer Society, 2015, pp. 842–850

J. Li, Z. Tu, B. Yang, M. R. Lyu, T. Zhang, Multi-head attention with disagreement regularization, in: EMNLP, Association for Computational Linguistics, 2018, pp. 2897–2903.

T. Shen, T. Zhou, G. Long, J. Jiang, S. Pan, C. Zhang, Disan: directional self-attention network for rnn/cnn-free language understanding, in: AAAI, AAAI Press, 2018, pp. 5446–5455.

J. Du, J. Han, A. Way, D. Wan, Multi-level structured self-attentions for distantly supervised relation extraction, in: EMNLP, Association for Computational Linguistics, 2018, pp. 2216–2225.

S. Venugopalan, M. Rohrbach, J. Donahue, R. J. Mooney, T. Darrell, K. Saenko, Sequence to sequence – video to text, in: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015, IEEE Computer Society, 2015, pp. 4534–4542.

Ding, 2019, Neural image caption generation with weighted training and reference, Cogn. Comput., 11, 763, 10.1007/s12559-018-9581-x

Zhang, 2019, Transfer hierarchical attention network for generative dialog system, Int. J. Autom. Comput., 16, 720, 10.1007/s11633-019-1200-0

R. Prabhavalkar, K. Rao, T. N. Sainath, B. Li, L. Johnson, N. Jaitly, A comparison of sequence-to-sequence models for speech recognition, in: F. Lacerda (Ed.), Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20–24, 2017, ISCA, 2017, pp. 939–943

S. Wang, J. Zhang, C. Zong, Learning sentence representation with guidance of human attention, in: IJCAI, ijcai.org, 2017, pp. 4137–4143.

S. Sukhbaatar, A. Szlam, J. Weston, R. Fergus, End-to-end memory networks, in: NIPS, pp. 2440–2448.

J. Weston, S. Chopra, A. Bordes, Memory networks, in: ICLR.

A. Kumar, O. Irsoy, P. Ondruska, M. Iyyer, J. Bradbury, I. Gulrajani, V. Zhong, R. Paulus, R. Socher, Ask me anything: Dynamic memory networks for natural language processing, in: ICML, Volume 48 of JMLR Workshop and Conference Proceedings, JMLR.org, 2016, pp. 1378–1387.

M. Henaff, J. Weston, A. Szlam, A. Bordes, Y. LeCun, Tracking the world state with recurrent entity networks, in: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings, OpenReview.net, 2017.

J. Gehring, M. Auli, D. Grangier, D. Yarats, Y.N. Dauphin, Convolutional sequence to sequence learning, in: ICML, Volume 70 of Proceedings of Machine Learning Research, PMLR, 2017, pp. 1243–1252.

X. Li, W. Wang, X. Hu, J. Yang, Selective kernel networks, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, Computer Vision Foundation/ IEEE, 2019, pp. 510–519.

M.F. Stollenga, J. Masci, F.J. Gomez, J. Schmidhuber, Deep networks with internal selective attention through feedback connections, in: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, K.Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8–13 2014, Montreal, Quebec, Canada, pp. 3545–3553.

J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, H. Lu, Dual attention network for scene segmentation, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, Computer Vision Foundation/ IEEE, 2019, pp. 3146–3154.

Y. Yuan, J. Wang, Ocnet: object context network for scene parsing, CoRR abs/1809.00916 (2018).

H. Zhao, Y. Zhang, S. Liu, J. Shi, C.C. Loy, D. Lin, J. Jia, Psanet: point-wise spatial attention network for scene parsing, in: V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (Eds.), Computer Vision – ECCV 2018–15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part IX, volume 11213 of Lecture Notes in Computer Science, Springer, 2018, pp. 270–286.

Y. Cao, J. Xu, S. Lin, F. Wei, H. Hu, Gcnet: non-local networks meet squeeze-excitation networks and beyond, in: 2019 IEEE/CVF International Conference on Computer Vision Workshops, ICCV Workshops 2019, Seoul, Korea (South), October 27–28, 2019, IEEE, 2019, pp. 1971–1980.

F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, X. Tang, Residual attention network for image classification, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, IEEE Computer Society, 2017, pp. 6450–6458.

K. Yue, M. Sun, Y. Yuan, F. Zhou, E. Ding, F. Xu, Compact generalized non-local network, in: S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett (Eds.), Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pp. 6511–6520.

S. Woo, J. Park, J. Lee, I.S. Kweon, CBAM: convolutional block attention module, in: V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (Eds.), Computer Vision – ECCV 2018–15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII, Volume 11211 of Lecture Notes in Computer Science, Springer, 2018, pp. 3–19.

Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, W. Liu, Ccnet: criss-cross attention for semantic segmentation, in: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, IEEE, 2019, pp. 603–612.

H. Mi, Z. Wang, A. Ittycheriah, Supervised attentions for neural machine translation, in: J. Su, X. Carreras, K. Duh (Eds.), Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, The Association for Computational Linguistics, 2016, pp. 2283–2288.

L. Liu, M. Utiyama, A.M. Finch, E. Sumita, Neural machine translation with supervised attention, in: N. Calzolari, Y. Matsumoto, R. Prasad (Eds.), COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, December 11–16, 2016, Osaka, Japan, ACL, 2016, pp. 3093–3102.

Yang, 2018, Modeling localness for self-attention networks, 4449

S.I. Wang, C.D. Manning, Baselines and bigrams: simple, good sentiment and topic classification, in: The 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8–14, 2012, Jeju Island, Korea – Volume 2: Short Papers, The Association for Computer Linguistics, 2012, pp. 90–94.

A.L. Maas, R.E. Daly, P.T. Pham, D. Huang, A.Y. Ng, C. Potts, Learning word vectors for sentiment analysis, in: D. Lin, Y. Matsumoto, R. Mihalcea (Eds.), The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19–24 June, 2011, Portland, Oregon, USA, The Association for Computer Linguistics, 2011, pp. 142–150.

Pang, 2007, Opinion mining and sentiment analysis, Found. Trends Inf. Retr., 2, 1, 10.1561/1500000011

M. Sahami, S. Dumais, D. Heckerman, E. Horvitz, A bayesian approach to filtering junk e-mail, in: Learning for Text Categorization: Papers from the 1998 workshop, vol. 62, Madison, Wisconsin, pp. 98–105.

Y. Song, J. Wang, T. Jiang, Z. Liu, Y. Rao, Attentional encoder network for targeted sentiment classification, CoRR abs/1902.09314 (2019)

A. Ambartsoumian, F. Popowich, Self-attention: a better building block for sentiment analysis neural network classifiers, in: A. Balahur, S.M. Mohammad, V. Hoste, R. Klinger (Eds.), Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, WASSA@EMNLP 2018, Brussels, Belgium, October 31, 2018, Association for Computational Linguistics, 2018, pp. 130–139.

D. Tang, B. Qin, T. Liu, Aspect level sentiment classification with deep memory network, in: J. Su, X. Carreras, K. Duh (Eds.), Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1–4, 2016, The Association for Computational Linguistics, 2016, pp. 214–224.

P. Zhu, T. Qian, Enhanced aspect level sentiment classification with auxiliary memory, in: E.M. Bender, L. Derczynski, P. Isabelle (Eds.), Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20–26, 2018, Association for Computational Linguistics, 2018, pp. 1077–1087.

Y. Cui, Z. Chen, S. Wei, S. Wang, T. Liu, G. Hu, Attention-over-attention neural networks for reading comprehension, in: R. Barzilay, M. Kan (Eds.), Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30–August 4, Volume 1: Long Papers, Association for Computational Linguistics, 2017, pp. 593–602.

T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, in: Y. Bengio, Y. LeCun (Eds.), 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2–4, 2013, Workshop Track Proceedings.

J. Pennington, R. Socher, C.D. Manning, Glove: global vectors for word representation, in: A. Moschitti, B. Pang, W. Daelemans (Eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A Meeting of SIGDAT, A Special Interest Group of the ACL, ACL, 2014, pp. 1532–1543.

M.E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, L. Zettlemoyer, Deep contextualized word representations, in: M.A. Walker, H. Ji, A. Stent (Eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1–6, 2018, Volume 1 (Long Papers), Association for Computational Linguistics, 2018, pp. 2227–2237.

J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding, in: J. Burstein, C. Doran, T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, Volume 1 (Long and Short Papers), Association for Computational Linguistics, 2019, pp. 4171–4186.

A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, Improving language understanding by generative pre-training, 2018.

Radford, 2019, Language models are unsupervised multitask learners, OpenAI Blog, 1, 9

T.B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D.M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language models are few-shot learners, CoRR abs/2005.14165 (2020).

Liu, 2021, A novel randomised particle swarm optimizer, Int. J. Mach. Learn. Cybern., 12, 529, 10.1007/s13042-020-01186-4

Zeng, 2020, A dynamic neighborhood-based switching particle swarm optimization algorithm, IEEE Trans. Cybern., 10.1109/TCYB.2019.2938895

Liu, 2021, A novel sigmoid-function-based adaptive weighted particle swarm optimizer, IEEE Trans. Cybern., 51, 1085, 10.1109/TCYB.2019.2925015

Rahman, 2020, An n-state markovian jumping particle swarm optimization algorithm, IEEE Trans. Syst., Man, Cybern.: Syst.

Luo, 2020, Position-transitional particle swarm optimization-incorporated latent factor analysis, IEEE Trans. Knowl. Data Eng.

Zeng, 2021, A competitive mechanism integrated multi-objective whale optimization algorithm with differential evolution, Neurocomputing, 432, 170, 10.1016/j.neucom.2020.12.065

J. Li, W. Monroe, D. Jurafsky, Understanding neural networks through representation erasure, CoRR abs/1612.08220 (2016).

E. Voita, D. Talbot, F. Moiseev, R. Sennrich, I. Titov, Analyzing multi-head self-attention: specialized heads do the heavy lifting, the rest can be pruned, in: A. Korhonen, D.R. Traum, L. Màrquez (Eds.), Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28–August 2, 2019, Volume 1: Long Papers, Association for Computational Linguistics, 2019, pp. 5797–5808.

Z. Dai, Z. Yang, Y. Yang, J.G. Carbonell, Q.V. Le, R. Salakhutdinov, Transformer-xl: Attentive language models beyond a fixed-length context, in: A. Korhonen, D.R. Traum, L. Màrquez (Eds.), Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28– August 2, 2019, Volume 1: Long Papers, Association for Computational Linguistics, 2019, pp. 2978–2988.

M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, L. Kaiser, Universal transformers, in: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019, OpenReview.net, 2019.

Q. Guo, X. Qiu, P. Liu, Y. Shao, X. Xue, Z. Zhang, Star-transformer, in: J. Burstein, C. Doran, T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, Volume 1 (Long and Short Papers), Association for Computational Linguistics, 2019, pp. 1315–1325.

X. Zhu, D. Cheng, Z. Zhang, S. Lin, J. Dai, An empirical study of spatial attention mechanisms in deep networks, in: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, IEEE, 2019, pp. 6687–6696.

Y. Tay, D. Bahri, D. Metzler, D. Juan, Z. Zhao, C. Zheng, Synthesizer: rethinking self-attention in transformer models, CoRR abs/2005.00743 (2020).

Y.H. Tsai, S. Bai, M. Yamada, L. Morency, R. Salakhutdinov, Transformer dissection: An unified understanding for transformer’s attention via the lens of kernel, in: K. Inui, J. Jiang, V. Ng, X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019, Association for Computational Linguistics, 2019, pp. 4343–4352.

A. Katharopoulos, A. Vyas, N. Pappas, F. Fleuret, Transformers are rnns: Fast autoregressive transformers with linear attention, CoRR abs/2006.16236 (2020)

C. Sen, T. Hartvigsen, B. Yin, X. Kong, E.A. Rundensteiner, Human attention maps for text classification: Do humans and neural networks focus on the same words?, in: D. Jurafsky, J. Chai, N. Schluter, J.R. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020, Association for Computational Linguistics, 2020, pp. 4596–4608.