Evolution of topics and hate speech in retweet network communities

Applied Network Science - Tập 6 - Trang 1-20 - 2021
Bojan Evkoski1,2, Nikola Ljubešić1,3, Andraž Pelicon1,2, Igor Mozetič1, Petra Kralj Novak1,4
1Department of Knowledge Technologies, Jozef Stefan Institute, Ljubljana, Slovenia
2Jozef Stefan International Postgraduate School, Ljubljana, Slovenia
3Faculty of Information and Communication Sciences, University of Ljubljana, Ljubljana, Slovenia
4Central European University, Vienna, Austria

Tóm tắt

Twitter data exhibits several dimensions worth exploring: a network dimension in the form of links between the users, textual content of the tweets posted, and a temporal dimension as the time-stamped sequence of tweets and their retweets. In the paper, we combine analyses along all three dimensions: temporal evolution of retweet networks and communities, contents in terms of hate speech, and discussion topics. We apply the methods to a comprehensive set of all Slovenian tweets collected in the years 2018–2020. We find that politics and ideology are the prevailing topics despite the emergence of the Covid-19 pandemic. These two topics also attract the highest proportion of unacceptable tweets. Through time, the membership of retweet communities changes, but their topic distribution remains remarkably stable. Some retweet communities are strongly linked by external retweet influence and form super-communities. The super-community membership closely corresponds to the topic distribution: communities from the same super-community are very similar by the topic distribution, and communities from different super-communities are quite different in terms of discussion topics. However, we also find that even communities from the same super-community differ considerably in the proportion of unacceptable tweets they post.

Tài liệu tham khảo

Aynaud T, Guillaume J-L (2010) Static community detection algorithms for evolving networks. In: 8th international symposium on modeling and optimization in mobile, ad hoc, and wireless networks, pp 513–519. IEEE Aynaud T, Fleury E, Guillaume J-L, Wang Q (2013) Communities in evolving networks: definitions, detection, and analysis techniques. In: Ganguly N, Deutsch A, Mukherjee A (eds) Dynamics on and of complex networks, vol 2. Springer, Berlin, pp 159–200. https://doi.org/10.1007/978-1-4614-6729-8_9 Benavoli A, Corani G, Demšar J, Zaffalon M (2017) Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis. J Mach Learn Res 18(1):2653–2688 Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3(4–5):993–1022 Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008(10):10008 Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8 Cherepnalkoski D, Mozetič I (2016) Retweet networks of the European parliament: evaluation of the community structure. Appl Netw Sci 1(1):2. https://doi.org/10.1007/s41109-016-0001-4 Cherepnalkoski D, Karpf A, Mozetič I, Grčar M (2016) Cohesion and coalition formation in the European parliament: roll-call votes and Twitter activities. PLoS ONE 11(11):0166586. https://doi.org/10.1371/journal.pone.0166586 Cinelli M, Cresci S, Galeazzi A, Quattrociocchi W, Tesconi M (2020) The limited reach of fake news on Twitter during 2019 European elections. PLoS ONE 15(6):0234689. https://doi.org/10.1371/journal.pone.0234689 Cinelli M, Pelicon A, Mozetič I, Quattrociocchi W, Novak PK, Zollo F (2021) Dynamics of online hate and misinformation. Sci Rep. https://doi.org/10.1038/s41598-021-01487-w Dakiche N, Tayeb FB-S, Slimani Y, Benatchba K (2019) Tracking community evolution in social networks: a survey. Inform Process Manag 56(3):1084–1102 Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 Durazzi F, Müller M, Salathé M, Remondini D (2021) Clusters of science and health related Twitter users become more isolated during the COVID-19 pandemic. arXiv:2011.06845 Endres DM, Schindelin JE (2003) A new metric for probability distributions. IEEE Trans Inf Theory 49(7):1858–1860. https://doi.org/10.1109/TIT.2003.813506 Evkoski B, Mozetič I, Ljubešić N, Novak PK (2021a) Community evolution in retweet networks. PLoS ONE 16(9):0256175 . https://doi.org/10.1371/journal.pone.0256175. arXiv:2105.06214 Evkoski B, Mozetič I, Novak PK (2021b) Community evolution with Ensemble Louvain. In: Complex networks 2021, Book of Abstracts Evkoski B, Pelicon A, Mozetič I, Ljubešić N, Novak PK (2021c) Retweet communities reveal the main sources of hate speech. arXiv:2105.14898 Fehn Unsvåg E, Gambäck B (2018) The effects of user features on Twitter hate speech detection. In: Proceedings of 2nd workshop on abusive language online (ALW2), pp 75–85. ACL. https://aclanthology.org/W18-5110 Fortunato S, Hric D (2016) Community detection in networks: a user guide. Phys Rep 659:1–44. https://doi.org/10.1016/j.physrep.2016.09.002 Gao L, Huang R (2017) Detecting online hate speech using context aware models. In: Proceedings of international conference recent advances in natural language processing (RANLP), pp 260–266. https://doi.org/10.26615/978-954-452-049-6_036 Gil de Zúñiga H, Koc Michalska K, Römmele A (2020) Populism in the era of Twitter: How social media contextualized new insights into an old phenomenon. New Media Soc 22(4):585–594 Grčar M, Cherepnalkoski D, Mozetič I, Kralj Novak P (2017) Stance and influence of Twitter users regarding the Brexit referendum. Comput Soc Netw 4(1):6. https://doi.org/10.1186/s40649-017-0042-6 Hartmann T, Kappes A, Wagner D (2016) Clustering evolving networks. In: Sanders P (ed) Algorithm engineering. Springer, Berlin, pp 280–329 Krippendorff K (2018) Content analysis, an introduction to its methodology, 4th edn. Sage Publications, Thousand Oaks Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86. https://doi.org/10.1214/aoms/1177729694 Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 37(1):145–151. https://doi.org/10.1109/18.61115 Ljubešić N, Dobrovoljc K (2019) What does neural bring? Analysing improvements in morphosyntactic annotation and lemmatisation of Slovenian, Croatian and Serbian. In: Proceedings of 7th workshop on Balto-Slavic natural language processing, pp 29–34. https://doi.org/10.18653/v1/W19-3704 Ljubešić N, Fišer D, Erjavec T (2014) TweetCaT: a tool for building Twitter corpora of smaller languages. In: Proceedings of 9th international conference on language resources and evaluation, pp 2279–2283. European Language Resources Association (ELRA), Reykjavik, Iceland. http://www.lrec-conf.org/proceedings/lrec2014/pdf/834_Paper.pdf Ljubešić N, Fišer D, Erjavec T (2019) The FRENK datasets of socially unacceptable discourse in Slovene and English. arXiv:1906.02045 MacAvaney S, Yao H-R, Yang E, Russell K, Goharian N, Frieder O (2019) Hate speech detection: challenges and solutions. PLoS ONE 14(8):0221152. https://doi.org/10.1371/journal.pone.0221152 Martin F, Johnson M (2015) More efficient topic modelling through a noun only approach. In: Proceedings of Australasian language technology association workshop, pp 111–115. https://www.aclweb.org/anthology/U15-1013 Masuda N, Lambiotte R (2016) A guide to temporal networks, vol 4. World Scientific, Singapore Matamoros-Fernández A, Farkas J (2021) Racism, hate speech, and social media: a systematic review and critique. Telev New Media 22(2):205–224 Mathew B, Dutt R, Goyal P, Mukherjee A (2019) Spread of hate speech in online social media. In: Proceedings of 10th ACM conference on web science, pp 173–182 Mathew B, Illendula A, Saha P, Sarkar S, Goyal P, Mukherjee A (2020) Hate begets hate: A temporal study of hate speech. Proc ACM Hum–Comput Interact 4(CSCW2):1–24 McCallum AK (2002) Mallet: a machine learning for language toolkit. http://mallet.cs.umass.edu Mishra P, Del Tredici M, Yannakoudakis H, Shutova E (2019) Abusive language detection with graph convolutional networks. In: Proceedings of 2019 conference of the North American chapter of the ACL: human language technologies, pp 2145–2150. https://doi.org/10.18653/v1/N19-1221 Mosca E, Wich M, Groh G (2021) Understanding and interpreting the impact of user context in hate speech detection. In: Proceedings of 9th international workshop on natural language processing for social media, pp 91–102 Mozetič I, Grčar M, Smailović J (2016) Multilingual Twitter sentiment classification: the role of human annotators. PLoS ONE 11(5):0155036. https://doi.org/10.1371/journal.pone.0155036 Mozetič I, Torgo L, Cerqueira V, Smailović J (2018) How to evaluate sentiment classifiers for Twitter time-ordered data? PLoS ONE 13(3):0194317. https://doi.org/10.1371/journal.pone.0194317 Rossetti G, Cazabet R (2018) Community discovery in dynamic networks. ACM Comput Surv 51(2):1–37. https://doi.org/10.1145/3172867 Sood S, Antin J, Churchill E (2012) Profanity use in online communities. In: Proceedings of SIGCHI conference on human factors in computing systems, pp 1481–1490 Steyvers M, Griffiths T (2007) Probabilistic topic models. In: Landauer T, McNamara D, Dennis S, Kintsch W (eds) Latent semantic analysis: a road to meaning. Laurence Erlbaum, Mahwah, pp 427–448 Ulčar M, Robnik-Šikonja M (2020) FinEst BERT and CroSloEngual BERT. In: International conference on text, speech, and dialogue. Springer, Berlin, pp 104–111 Uyheng J, Carley KM (2021) Characterizing network dynamics of online hate communities around the covid-19 pandemic. Appl Netw Sci 6(1):1–21 Van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworth, Newton Wu S, Hofman JM, Mason WA, Watts DJ (2011) Who says what to whom on Twitter. In: Proceedings of 20th international conference on world wide web, pp 705–714 Zampieri M, Malmasi S, Nakov P, Rosenthal S, Farra N, Kumar R (2019) Predicting the type and target of offensive posts in social media. In: Proceedings of North American Chapter of the ACL Zampieri M, Nakov P, Rosenthal S, Atanasova P, Karadzhov G, Mubarak H, Derczynski L, Pitenis Z, Çöltekin Ç (2020) SemEval-2020 task 12: multilingual offensive language identification in social media (OffensEval 2020). arXiv:2006.07235 Zollo F, Kralj Novak P, Del Vicario M, Bessi A, Mozetič I, Scala A, Caldarelli G, Quattrociocchi W (2015) Emotional dynamics in the age of misinformation. PLoS ONE 10(9):0138740. https://doi.org/10.1371/journal.pone.0138740