Ridge count thresholding to uncover coordinated networks during onset of the Covid-19 pandemic

Social Network Analysis and Mining - Tập 12 - Trang 1-25 - 2022
Spencer Lee Kirn1, Mark K. Hinders1
1William and Mary Applied Science Department, Williamsburg, USA

Tóm tắt

In order to combat information operations (IO) and disinformation campaigns, one must look at the behaviors of the accounts pushing specific narratives and stories through social media, not at the content itself. In this work, we present a new process for extracting tweet storms and uncovering networks of accounts that are working in a coordinated fashion using ridge count thresholding (RCT). To do this, we started with a dataset of 60 million individual tweets from the early weeks of the Covid-19 pandemic. Coherent topics are extracted from this data by testing three different preprocessing pipelines and applying Orthogonal Nonnegative Matrix Factorization (ONMF). The most effective preprocessing pipeline used hashtag preclustering to downselect the total dataset to the 7 million tweets that included the top hashtags. Each topic identified by ONMF is described by a topic-tweet signal, crafted using the time stamp included in each tweet’s metadata. These signals were broken down into tweet storms using RCT, which is calculated from the Dynamic Wavelet Fingerprint transform of each topic-tweet signal. Each tweet storm described a time of increased activity around a topic. Tweet storms identified in this way each represent some behavior in the underlying network. In total, we identified 39,817 total tweet storms that included about 2 million unique tweets. These tweet storms were used to identify networks of accounts that commonly co-occur within tweet storms to isolate those communities most responsible for driving narratives and pushing stories through social media. Through this process, we were able to identify 22 unique networks of accounts that were densely connected based on RCT tweet storm identification. Many of the identified networks exhibit obvious inauthentic behaviors that are potentially a part of an IO campaign.

Tài liệu tham khảo

Abu-El-Rub N, Mueen A (2019) Botcamp: bot-driven interactions in social campaigns. In: The world wide web conference. ACM, pp 2529–2535 Alba D, Frenkel S (2020) Medical expert who corrects Trump now a target of the far right. The New York Times Baines D, Elliott RJ et al (2020) Defining misinformation, disinformation and malinformation: an urgent need for clarity during the Covid-19 infodemic. Technical report Banda JM, Tekumalla R (2020) A Twitter dataset of 40+ million tweets related to COVID-19. https://doi.org/10.5281/zenodo.3723940 Barnes JE, Sanger DE (2020) Russian intelligence agencies push disinformation on pandemic. The New York Times Barrett B (2020) Russia doesn’t want Bernie Sanders. It wants chaos. Wired Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. http://www.aaai.org/ocs/index.php/ICWSM/09/paper/view/154 Bastick Z (2020) Would you notice if fake news changed your behavior? An experiment on the unconscious effects of disinformation. Comput Hum Behav 116:106633 Bernstein J (2021) Bad news: selling the story of disinformation. Harper’s Magazine Berry MW, Browne M, Langville AN, Pauca VP, Plemmons RJ (2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 52(1):155–173 Bertoncini CA, Hinders MK (2010) Fuzzy classification of roof fall predictors in microseismic monitoring. Measurement 43(10):1690–1701. https://doi.org/10.1016/j.measurement.2010.09.015 Bertoncini CA, Rudd K, Nousain B, Hinders M (2012) Wavelet fingerprinting of radio-frequency identification (RFID) tags. IEEE Trans Ind Electron 59(12):4843–4850. https://doi.org/10.1109/TIE.2011.2179276 Beskow DM, Carley KM (2020) You are known by your friends: leveraging network metrics for bot detection in Twitter. In: Open source intelligence and cyber crime. Springer, pp 53–88 Bessi A, Ferrara E (2016) Social bots distort the 2016 U.S. presidential election online discussion. First Monday 21(11-7) Bingham J, Hinders M, Friedman A (2009) Lamb wave detection of limpet mines on ship hulls. Ultrasonics 49(8):706–722. https://doi.org/10.1016/j.ultras.2009.05.009 Bird S, Klein S, Loper E (2009) Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc, Sebastopol Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 10:10008 Boshmaf Y, Logothetis D, Siganos G, Lería J, Lorenzo J, Ripeanu M, Beznosov K (2015) Integro: leveraging victim prediction for robust fake account detection in OSNS. NDSS 15:8–11 Bradshaw S (2019) Disinformation optimised: gaming search engine algorithms to amplify junk news. Internet Policy Rev 8(4):1–24 Bradshaw S, Howard PN (2018) The global organization of social media disinformation campaigns. J Int Affairs 71(1.5):23–32 Broniatowski DA, Jamison AM, Qi S, AlKulaib L, Chen T, Benton A, Quinn SC, Dredze M (2018) Weaponized health communication: Twitter bots and Russian trolls amplify the vaccine debate. Am J Public Health 108(10):1378–1384 Conger K (2021) Twitter, in widening crackdown, removes over 70,000 QAnon accounts. New York Times Coppins M (2020) The billion-dollar disinformation campaign to reelect the president: how new technologies and techniques pioneered by dictators will shape the 2020 election. The Atlantic Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2017) The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: Proceedings of the 26th international conference on world wide web companion. International World Wide Web Conferences Steering Committee, pp 963–972 Del Vicario M, Vivaldo G, Bessi A, Zollo F, Scala A, Caldarelli G, Quattrociocchi W (2016) Echo chambers: emotional contagion and group polarization on Facebook. Sci Rep 6:37825 Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 126–135 Ferrara E (2017) Disinformation and social bot operations in the run up to the 2017 French Presidential Election. First Monday 22(8) Ferrara E (2020) # Covid-19 on Twitter: bots, conspiracies, and social media activism. arXiv preprint arXiv:200409531 Ferrara E, Varol O, Davis C, Menczer F, Flammini A (2016) The rise of social bots. Commun ACM 59(7):96–104 Guynn J (2020) ‘Significant and growing public health challenge,’ Twitter cracks down on COVID-19 vaccine misinformation. USA Today Hinders MK (2020) Intelligent feature selection for machine learning using the dynamic wavelet fingerprint. Springer, Berlin. https://doi.org/10.1007/978-3-030-49395-0 Hou J, Hinders MK (2002) Dynamic wavelet fingerprint identification of ultrasound signals. Mater Eval 60(9):1089–1093 Hou J, Leonard KR, Hinders MK (2004) Automatic multi-mode lamb wave arrival time extraction for improved tomographic reconstruction. Inverse Probl 20(6):1873–1888. https://doi.org/10.1088/0266-5611/20/6/012 Howard PN, Kollanyi B (2016) Bots, #strongerin, and #Brexit: computational propaganda during the UK-EU referendum. Available at SSRN 2798311 Hunnicutt T, Bose N (2021) Biden orders review of COVID origins as lab leak theory debated. Reuters Hurtado S, Ray P, Marculescu R (2019) Bot detection in reddit political discussion. In: Proceedings of the fourth international workshop on social sensing, pp 30–35 Jefferson T (1807) From Thomas Jefferson to John Norvell, 11 June 1807. https://founders.archives.gov/documents/Jefferson/99-01-02-5737 Keller FB, Schoch D, Stier S, Yang J (2020) Political astroturfing on Twitter: how to coordinate a disinformation campaign. Polit Commun 37(2):256–280 Kirn SL (2021) Uncovering information operations on Twitter using natural language processing and the dynamic wavelet fingerprint. Doctoral dissertation, The College of William and Mary Kirn SL, Hinders MK (2020) Dynamic wavelet fingerprint for differentiation of tweet storm types. Soc Netw Anal Min 10(1):4 Kirn SL, Hinders MK (2021) Bayesian identification of bots using temporal analysis of tweet storms. Soc Netw Anal Min 11(1):1–17 Kormann C (2021) The mysterious case of the Covid-19 lab-leak theory. The New Yorker LaFrance A (2020) The prophecies of Q: American conspiracy theories are entering a dangerous new phase. The Atlantic Liu PL, Huang LV (2020) Digital disinformation about Covid-19 and the third-person effect: examining the channel differences and negative emotional outcomes. Cyberpsychol Behav Soc Netw 23(11):789–793 Mehrotra R, Sanner S, Buntine W, Xie L (2013) Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pp 889–892 Miller CA, Hinders MK (2014) Classification of flaw severity using pattern recognition for guided wave-based structural health monitoring. Ultrasonics 54(1):247–258. https://doi.org/10.1016/j.ultras.2013.04.020 Mimno D, Wallach H, Talley E, Leenders M, McCallum A (2011) Optimizing semantic coherence in topic models. In: Proceedings of the 2011 conference on empirical methods in natural language processing, pp 262–272 Mueller RS (2019) Report on the investigation into Russian interference in the 2016 presidential election. US Department of Justice, Washington Nguyen A, Catalan D (2020) Digital mis/disinformation and public engagement with health and science controversies: fresh perspectives from Covid-19. Media Commun 8(2):323–328 Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830 Pierri F, Artoni A, Ceri S (2020) Investigating Italian disinformation spreading on Twitter in the context of 2019 European elections. PLoS One 15(1):e0227821 Rauchfleisch A, Kaiser J (2020) The false positive problem of automatic bot detection in social science research. Berkman Klein Center Research Publication, Cambridge Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks. ELRA, Valletta, Malta, pp 45–50 Rid T (2020) Active measures: the secret history of disinformation and political warfare. Farrar, Straus and Giroux, New York Rooney M (2021) Characterization of wireless communications networks using machine learning and 3D electromagnetic wave propagation simulations. Doctoral dissertation, The College of William and Mary Schild L, Ling C, Blackburn J, Stringhini G, Zhang Y, Zannettou S (2020) “ Go eat a bat, chang!”: an early look on the emergence of sinophobic behavior on web communities in the face of Covid-19. arXiv preprint arXiv:200404046 Schneier B (2020) Bots are destroying political discourse as we know it. The Atlantic Sills J, Bloom JD, Chan YA, Baric RS, Bjorkman PJ, Cobey S, Deverman BE, Fisman DN, Gupta R, Iwasaki A, Lipsitch M, Medzhitov R, Neher RA, Nielsen R, Patterson N, Stearns T, van Nimwegen E, Worobey M, Relman DA (2021) Investigate the origins of COVID-19. Science 372(6543):694 Skinner E, Kirn S, Hinders M (2019) Development of underwater beacon for Arctic through-ice communication via satellite. Cold Reg Sci Technol 160:58–79. https://doi.org/10.1016/j.coldregions.2019.01.010 Tweepy (2017) Streaming with tweepy–tweepy 3.5.0 documentation. http://tweepy.readthedocs.io/en/v3.5.0/streaming_how_to.html Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605 Wang Y, McKee M, Torbica A, Stuckler D (2019) Systematic literature review on the spread of health-related misinformation on social media. Soc Sci Med 240:112552 Warzel C (2020) Twitter is real life. The New York Times Woolley S (2020) The reality game: how the next wave of technology will break the truth. PublicAffairs Yang KC, Varol O, Davis CA, Ferrara E, Flammini A, Menczer F (2019) Arming the public with artificial intelligence to counter social bots. Hum Behav Emerg Technol 1(1):48–61 Yang KC, Varol O, Hui PM, Menczer F (2019) Scalable and generalizable social bot detection through data selection. arXiv preprint arXiv:191109179 Yao Y, Viswanath B, Cryan J, Zheng H, Zhao BY (2017) Automated crowdturfing attacks and defenses in online review systems. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 1143–1158 Zannettou S, Caulfield T, De Cristofaro E, Sirivianos M, Stringhini G, Blackburn J (2019) Disinformation warfare: understanding state-sponsored trolls on Twitter and their influence on the web. In: Companion proceedings of the 2019 world wide web conference, pp 218–226