Ridge count thresholding to uncover coordinated networks during onset of the Covid-19 pandemic
Tóm tắt
In order to combat information operations (IO) and disinformation campaigns, one must look at the behaviors of the accounts pushing specific narratives and stories through social media, not at the content itself. In this work, we present a new process for extracting tweet storms and uncovering networks of accounts that are working in a coordinated fashion using ridge count thresholding (RCT). To do this, we started with a dataset of 60 million individual tweets from the early weeks of the Covid-19 pandemic. Coherent topics are extracted from this data by testing three different preprocessing pipelines and applying Orthogonal Nonnegative Matrix Factorization (ONMF). The most effective preprocessing pipeline used hashtag preclustering to downselect the total dataset to the 7 million tweets that included the top hashtags. Each topic identified by ONMF is described by a topic-tweet signal, crafted using the time stamp included in each tweet’s metadata. These signals were broken down into tweet storms using RCT, which is calculated from the Dynamic Wavelet Fingerprint transform of each topic-tweet signal. Each tweet storm described a time of increased activity around a topic. Tweet storms identified in this way each represent some behavior in the underlying network. In total, we identified 39,817 total tweet storms that included about 2 million unique tweets. These tweet storms were used to identify networks of accounts that commonly co-occur within tweet storms to isolate those communities most responsible for driving narratives and pushing stories through social media. Through this process, we were able to identify 22 unique networks of accounts that were densely connected based on RCT tweet storm identification. Many of the identified networks exhibit obvious inauthentic behaviors that are potentially a part of an IO campaign.
Tài liệu tham khảo
Abu-El-Rub N, Mueen A (2019) Botcamp: bot-driven interactions in social campaigns. In: The world wide web conference. ACM, pp 2529–2535
Alba D, Frenkel S (2020) Medical expert who corrects Trump now a target of the far right. The New York Times
Baines D, Elliott RJ et al (2020) Defining misinformation, disinformation and malinformation: an urgent need for clarity during the Covid-19 infodemic. Technical report
Banda JM, Tekumalla R (2020) A Twitter dataset of 40+ million tweets related to COVID-19. https://doi.org/10.5281/zenodo.3723940
Barnes JE, Sanger DE (2020) Russian intelligence agencies push disinformation on pandemic. The New York Times
Barrett B (2020) Russia doesn’t want Bernie Sanders. It wants chaos. Wired
Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. http://www.aaai.org/ocs/index.php/ICWSM/09/paper/view/154
Bastick Z (2020) Would you notice if fake news changed your behavior? An experiment on the unconscious effects of disinformation. Comput Hum Behav 116:106633
Bernstein J (2021) Bad news: selling the story of disinformation. Harper’s Magazine
Berry MW, Browne M, Langville AN, Pauca VP, Plemmons RJ (2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 52(1):155–173
Bertoncini CA, Hinders MK (2010) Fuzzy classification of roof fall predictors in microseismic monitoring. Measurement 43(10):1690–1701. https://doi.org/10.1016/j.measurement.2010.09.015
Bertoncini CA, Rudd K, Nousain B, Hinders M (2012) Wavelet fingerprinting of radio-frequency identification (RFID) tags. IEEE Trans Ind Electron 59(12):4843–4850. https://doi.org/10.1109/TIE.2011.2179276
Beskow DM, Carley KM (2020) You are known by your friends: leveraging network metrics for bot detection in Twitter. In: Open source intelligence and cyber crime. Springer, pp 53–88
Bessi A, Ferrara E (2016) Social bots distort the 2016 U.S. presidential election online discussion. First Monday 21(11-7)
Bingham J, Hinders M, Friedman A (2009) Lamb wave detection of limpet mines on ship hulls. Ultrasonics 49(8):706–722. https://doi.org/10.1016/j.ultras.2009.05.009
Bird S, Klein S, Loper E (2009) Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc, Sebastopol
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 10:10008
Boshmaf Y, Logothetis D, Siganos G, Lería J, Lorenzo J, Ripeanu M, Beznosov K (2015) Integro: leveraging victim prediction for robust fake account detection in OSNS. NDSS 15:8–11
Bradshaw S (2019) Disinformation optimised: gaming search engine algorithms to amplify junk news. Internet Policy Rev 8(4):1–24
Bradshaw S, Howard PN (2018) The global organization of social media disinformation campaigns. J Int Affairs 71(1.5):23–32
Broniatowski DA, Jamison AM, Qi S, AlKulaib L, Chen T, Benton A, Quinn SC, Dredze M (2018) Weaponized health communication: Twitter bots and Russian trolls amplify the vaccine debate. Am J Public Health 108(10):1378–1384
Conger K (2021) Twitter, in widening crackdown, removes over 70,000 QAnon accounts. New York Times
Coppins M (2020) The billion-dollar disinformation campaign to reelect the president: how new technologies and techniques pioneered by dictators will shape the 2020 election. The Atlantic
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2017) The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: Proceedings of the 26th international conference on world wide web companion. International World Wide Web Conferences Steering Committee, pp 963–972
Del Vicario M, Vivaldo G, Bessi A, Zollo F, Scala A, Caldarelli G, Quattrociocchi W (2016) Echo chambers: emotional contagion and group polarization on Facebook. Sci Rep 6:37825
Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 126–135
Ferrara E (2017) Disinformation and social bot operations in the run up to the 2017 French Presidential Election. First Monday 22(8)
Ferrara E (2020) # Covid-19 on Twitter: bots, conspiracies, and social media activism. arXiv preprint arXiv:200409531
Ferrara E, Varol O, Davis C, Menczer F, Flammini A (2016) The rise of social bots. Commun ACM 59(7):96–104
Guynn J (2020) ‘Significant and growing public health challenge,’ Twitter cracks down on COVID-19 vaccine misinformation. USA Today
Hinders MK (2020) Intelligent feature selection for machine learning using the dynamic wavelet fingerprint. Springer, Berlin. https://doi.org/10.1007/978-3-030-49395-0
Hou J, Hinders MK (2002) Dynamic wavelet fingerprint identification of ultrasound signals. Mater Eval 60(9):1089–1093
Hou J, Leonard KR, Hinders MK (2004) Automatic multi-mode lamb wave arrival time extraction for improved tomographic reconstruction. Inverse Probl 20(6):1873–1888. https://doi.org/10.1088/0266-5611/20/6/012
Howard PN, Kollanyi B (2016) Bots, #strongerin, and #Brexit: computational propaganda during the UK-EU referendum. Available at SSRN 2798311
Hunnicutt T, Bose N (2021) Biden orders review of COVID origins as lab leak theory debated. Reuters
Hurtado S, Ray P, Marculescu R (2019) Bot detection in reddit political discussion. In: Proceedings of the fourth international workshop on social sensing, pp 30–35
Jefferson T (1807) From Thomas Jefferson to John Norvell, 11 June 1807. https://founders.archives.gov/documents/Jefferson/99-01-02-5737
Keller FB, Schoch D, Stier S, Yang J (2020) Political astroturfing on Twitter: how to coordinate a disinformation campaign. Polit Commun 37(2):256–280
Kirn SL (2021) Uncovering information operations on Twitter using natural language processing and the dynamic wavelet fingerprint. Doctoral dissertation, The College of William and Mary
Kirn SL, Hinders MK (2020) Dynamic wavelet fingerprint for differentiation of tweet storm types. Soc Netw Anal Min 10(1):4
Kirn SL, Hinders MK (2021) Bayesian identification of bots using temporal analysis of tweet storms. Soc Netw Anal Min 11(1):1–17
Kormann C (2021) The mysterious case of the Covid-19 lab-leak theory. The New Yorker
LaFrance A (2020) The prophecies of Q: American conspiracy theories are entering a dangerous new phase. The Atlantic
Liu PL, Huang LV (2020) Digital disinformation about Covid-19 and the third-person effect: examining the channel differences and negative emotional outcomes. Cyberpsychol Behav Soc Netw 23(11):789–793
Mehrotra R, Sanner S, Buntine W, Xie L (2013) Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pp 889–892
Miller CA, Hinders MK (2014) Classification of flaw severity using pattern recognition for guided wave-based structural health monitoring. Ultrasonics 54(1):247–258. https://doi.org/10.1016/j.ultras.2013.04.020
Mimno D, Wallach H, Talley E, Leenders M, McCallum A (2011) Optimizing semantic coherence in topic models. In: Proceedings of the 2011 conference on empirical methods in natural language processing, pp 262–272
Mueller RS (2019) Report on the investigation into Russian interference in the 2016 presidential election. US Department of Justice, Washington
Nguyen A, Catalan D (2020) Digital mis/disinformation and public engagement with health and science controversies: fresh perspectives from Covid-19. Media Commun 8(2):323–328
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Pierri F, Artoni A, Ceri S (2020) Investigating Italian disinformation spreading on Twitter in the context of 2019 European elections. PLoS One 15(1):e0227821
Rauchfleisch A, Kaiser J (2020) The false positive problem of automatic bot detection in social science research. Berkman Klein Center Research Publication, Cambridge
Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks. ELRA, Valletta, Malta, pp 45–50
Rid T (2020) Active measures: the secret history of disinformation and political warfare. Farrar, Straus and Giroux, New York
Rooney M (2021) Characterization of wireless communications networks using machine learning and 3D electromagnetic wave propagation simulations. Doctoral dissertation, The College of William and Mary
Schild L, Ling C, Blackburn J, Stringhini G, Zhang Y, Zannettou S (2020) “ Go eat a bat, chang!”: an early look on the emergence of sinophobic behavior on web communities in the face of Covid-19. arXiv preprint arXiv:200404046
Schneier B (2020) Bots are destroying political discourse as we know it. The Atlantic
Sills J, Bloom JD, Chan YA, Baric RS, Bjorkman PJ, Cobey S, Deverman BE, Fisman DN, Gupta R, Iwasaki A, Lipsitch M, Medzhitov R, Neher RA, Nielsen R, Patterson N, Stearns T, van Nimwegen E, Worobey M, Relman DA (2021) Investigate the origins of COVID-19. Science 372(6543):694
Skinner E, Kirn S, Hinders M (2019) Development of underwater beacon for Arctic through-ice communication via satellite. Cold Reg Sci Technol 160:58–79. https://doi.org/10.1016/j.coldregions.2019.01.010
Tweepy (2017) Streaming with tweepy–tweepy 3.5.0 documentation. http://tweepy.readthedocs.io/en/v3.5.0/streaming_how_to.html
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605
Wang Y, McKee M, Torbica A, Stuckler D (2019) Systematic literature review on the spread of health-related misinformation on social media. Soc Sci Med 240:112552
Warzel C (2020) Twitter is real life. The New York Times
Woolley S (2020) The reality game: how the next wave of technology will break the truth. PublicAffairs
Yang KC, Varol O, Davis CA, Ferrara E, Flammini A, Menczer F (2019) Arming the public with artificial intelligence to counter social bots. Hum Behav Emerg Technol 1(1):48–61
Yang KC, Varol O, Hui PM, Menczer F (2019) Scalable and generalizable social bot detection through data selection. arXiv preprint arXiv:191109179
Yao Y, Viswanath B, Cryan J, Zheng H, Zhao BY (2017) Automated crowdturfing attacks and defenses in online review systems. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 1143–1158
Zannettou S, Caulfield T, De Cristofaro E, Sirivianos M, Stringhini G, Blackburn J (2019) Disinformation warfare: understanding state-sponsored trolls on Twitter and their influence on the web. In: Companion proceedings of the 2019 world wide web conference, pp 218–226