Spanish Corpora of tweets about COVID-19 vaccination for automatic stance detection

Information Processing & Management - Tập 60 - Trang 103294 - 2023
Rubén Yáñez Martínez1, Guillermo Blanco1,2,3, Anália Lourenço1,2,3
1Universidade de Vigo, Department of Computer Science, ESEI-Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n 32004 Ourense, Spain
2CINBIO, The Biomedical Research Centre, Universidade de Vigo, Campus Univesitario Lagoas-Marcosende, 36310 Vigo, Spain
3SING, Next Generation Computer Systems Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain

Tài liệu tham khảo

Fleiss, 1971, Measuring nominal scale agreement among many raters, Psychological Bulletin, 76, 378, 10.1037/h0031619 Landis, 1977, The measurement of observer agreement for categorical data, Biometrics, 33, 159, 10.2307/2529310 Cortes, 1995, Support-Vector Networks, 20, 273 Breiman, 2001, Random Forests, 45, 5 Rennie, 2003, Tackling the poor assumptions of naive Bayes text classifiers Özgür, A., Özgür, L., & Güngör, T. (2005). Text categorization with class-based and Corpus-based keyword selection. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3733 LNCS, 606–615. https://doi.org/10.1007/11569596_63. Campello, 2013, Density-based clustering based on hierarchical density estimates, 160 Le, 2014, Distributed representations of sentences and documents, 4, 2931 Moulavi, D., Jaskowiak, P.A., Campello, R.J.G.B., Zimek, A., & Sander, J. (2014). Density-based clustering validation. da Silva, 2016, Using unsupervised information to improve semi-supervised tweet sentiment classification, Information Sciences, 348, 10.1016/j.ins.2016.02.002 Misra, A., Ecker, B., Handleman, T., Hahn, N., & Walker, M. (2016). NLDS-UCSC at SemEval-2016 Task 6: A semi-supervised approach to detecting stance in tweets. Proceeding, 420–427. Mohammad, S.M., Kiritchenko, S., Sobhani, P., Zhu, X., & Cherry, C. (2016). SemEval-2016 Task 6: Detecting Stance in Tweets. 31–41. http://alt.qcri.org/semeval2016/task6/. Nakov, 2016, Developing a successful SemEval task in sentiment analysis of Twitter and other social media texts, Language Resources and Evaluation, 50, 35, 10.1007/s10579-015-9328-1 Stasis, 2016, Semantically controlled adaptive equalisation in reduced dimensionality parameter space, Applied Sciences, 6, 116, 10.3390/app6040116 Conneau, A., Kiela, D., Schwenk, H., Barrault, L., & Bordes, A. (2017). Supervised learning of universal sentence representations from natural language inference data. https://arxiv.org/abs/1705.02364. Darwish, 2019, Unsupervised user stance detection on Twitter, 141 Jo, 2019, Delta-training: Simple semi-supervised text classification using pretrained word embeddings, 3458 Reimers, 2019, Sentence-BERT: Sentence embeddings using Siamese BERT-networks, 3982 Abd-Alrazaq, 2020, Top concerns of tweeters during the COVID-19 pandemic: Infoveillance study, Journal of Medical Internet Research, 22, e19016, 10.2196/19016 Aiello, 2020, Social media– and internet-based disease surveillance for public health, Annual Review of Public Health, 41, 101, 10.1146/annurev-publhealth-040119-094402 Cañete, J., Chaperon, G., Fuentes, R., Ho, J.-H., Kang, H., & Pérez, J. (2020). Spanish pre-trained Bert model and evaluation data. Practical ML for Developing Countries Workshop @ICLR 2020. https://github.com/josecannete/spanish-corpora. Conforti, C., Berndt, J., Pilehvar, M.T., Giannitsarou, C., Toxvaerd, F., & Collier, N. (2020). Will-they-won't-they: A very large dataset for stance detection on Twitter. 1715–1724. https://doi.org/10.18653/v1/2020.acl-main.157. Evrard, M., Uro, R., Hervé, N., & Mazoyer, B. (2020). French Tweet Corpus for automatic stance detection. 11–16. Giasemidis, 2020, A semi-supervised approach to message stance classification, IEEE Transactions on Knowledge and Data Engineering, 32, 1, 10.1109/TKDE.2018.2880192 Giorgioni, S., Politi, M., Salman, S., Croce, D., & Basili, R. (2020). UNITOR @ Sardistance2020: Combining transformer-based architectures and transfer learning for robust stance detection. https://en.wikipedia.org/wiki/Sardines_movement. Küçük, 2020, Stance detection, ACM Computing Surveys (CSUR), 53, 10.1145/3369026 Kunneman, 2020, Monitoring stance towards vaccination in twitter messages, BMC Medical Informatics and Decision Making, 20, 1, 10.1186/s12911-020-1046-y Mcinnes, L., Healy, J., & Melville, J. (2020). UMAP: Uniform manifold approximation and projection for dimension reduction. Roesslein, J. (2020). Tweepy: Twitter for Python! https://github.com/tweepy/tweepy. Sancheti, A., Chawla, K., & Verma, G. (2020). LynyrdSkynyrd at WNUT-2020 Task 2: Semi-supervised learning for identification of informative COVID-19 English Tweets. https://arxiv.org/abs/2009.03849. Zotova, 2020, Multilingual stance detection in Tweets: The Catalonia Independence Corpus - ACL Anthology Agerri, R., Centeno, R., Espinosa, M., Fernandez De Landa, J., & Rodrigo, A. (2021). VaxxStance@IberLEF 2021: Overview of the task on going beyond text in cross-lingual stance detection. https://doi.org/10.26342/2021-67-15. Al-Ghadir, 2021, A novel approach to stance detection in social media tweets by fusing ranked lists and sentiments, Information Fusion, 67, 29, 10.1016/j.inffus.2020.10.003 Al-Laith, 2021, AraSenCorpus: A semi-supervised approach for sentiment annotation of a large arabic text corpus, Applied Sciences 2021, 11, 2434 ALDayel, 2021, Stance detection on social media: State of the art and trends, Information Processing & Management, 58, 10.1016/j.ipm.2021.102597 Alsafari, 2021, Semi-supervised self-training of hate and offensive speech from social media, Applied Artificial Intelligence, 10.1080/08839514.2021.1988443 Chawla, 2021, Predictors and outcomes of individual knowledge on early-stage pandemic: Social media, information credibility, public opinion, and behaviour in a large-scale global study, Information Processing & Management, 58, 10.1016/j.ipm.2021.102720 Chen, 2021, Social media use for health purposes: systematic review, Journal of Medical Internet Research, 23, e17917, 10.2196/17917 Herrera-Peco, 2021, Antivaccine movement and COVID-19 Negationism: A content analysis of Spanish-written messages on Twitter, Vaccines, 9, 656, 10.3390/vaccines9060656 Kaushal, A., Saha, A., & Ganguly, N. (2021). tWT–WT: A Dataset to Assert the Role of Target Entities for Detecting Stance of Tweets. 3879–3889. https://doi.org/10.18653/V1/2021.NAACL-MAIN.303. Kumari, 2021, Misinformation detection using multitask learning with mutual learning for novelty detection and emotion recognition, Information Processing & Management, 58, 10.1016/j.ipm.2021.102631 Meng, 2021, PND66 topic landscape analysis of Reddit social media submissions in insomnia, Value in Health, 24, S171, 10.1016/j.jval.2021.04.850 Murakami, 2021, Neural topic models for short text using pretrained word embeddings and its application to real data, 146 Santoveña-Casal, 2021, Digital citizens’ feelings in national #Covid 19 campaigns in Spain, Heliyon, 7, e08112, 10.1016/j.heliyon.2021.e08112 Suarez-Lledo, 2021, Prevalence of health misinformation on social media: Systematic review, Journal of Medical Internet Research, 23, 10.2196/17187 Zhao, 2021, A neural topic model with word vectors and entity vectors for short texts, Information Processing & Management, 58, 10.1016/j.ipm.2020.102455 Zhou, 2021, Characterizing the dissemination of misinformation on social media in health emergencies: An empirical study based on COVID-19, Information Processing & Management, 58, 10.1016/j.ipm.2021.102554 Alkhalifa, 2022, Capturing stance dynamics in social media: Open challenges and research directions, International Journal of Digital Humanities, 10.1007/s42803-022-00043-w Dutta, 2022, Semi-supervised stance detection of tweets via distant network supervision, 241 Kumari, 2022, What the fake? Probing misinformation detection standing on the shoulder of novelty and emotion, Information Processing & Management, 59, 10.1016/j.ipm.2021.102740 Pan, 2022, A probabilistic framework for integrating sentence-level semantics via BERT into pseudo-relevance feedback, Information Processing & Management, 59, 10.1016/j.ipm.2021.102734 Roy, 2022, gDART: Improving rumor verification in social media with Discrete Attention Representations, Information Processing & Management, 59, 10.1016/j.ipm.2022.102927 Salmi, 2022, Detecting changes in help seeker conversations on a suicide prevention helpline during the COVID− 19 pandemic: In-depth analysis using encoder representations from transformers, BMC Public Health, 22, 530, 10.1186/s12889-022-12926-2 Cer, 2018, 169