Design and analysis of tweet-based election models for the 2021 Mexican legislative election
Tóm tắt
Modelling and forecasting real-life human behaviour using online social media is an active endeavour of interest in politics, government, academia, and industry. Since its creation in 2006, Twitter has been proposed as a potential laboratory that could be used to gauge and predict social behaviour. During the last decade, the user base of Twitter has been growing and becoming more representative of the general population. Here we analyse this user base in the context of the 2021 Mexican Legislative Election. To do so, we use a dataset of 15 million election-related tweets in the six months preceding election day. We explore different election models that assign political preference to either the ruling parties or the opposition. We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods. These results demonstrate that analysis of public online data can outperform conventional polling methods, and that political analysis and general forecasting would likely benefit from incorporating such data in the immediate future. Moreover, the same Twitter dataset with geographical attributes is positively correlated with results from official census data on population and internet usage in Mexico. These findings suggest that we have reached a period in time when online activity, appropriately curated, can provide an accurate representation of offline behaviour.
Tài liệu tham khảo
Tankard JW Jr (1972) Public opinion polling by newspapers in the presidential election campaign of 1824. Journal Mass Commun Q 49(2):361–365
Tumasjan A, Sprenger T, Sandner P, Welpe I (2010) Predicting elections with Twitter: what 140 characters reveal about political sentiment. In: Proceedings of the international AAAI conference on web and social media, vol 4, pp 178–185. https://doi.org/10.1609/icwsm.v4i1.14009
O’Connor B, Balasubramanyan R, Routledge B, Smith N (2010) From tweets to polls: linking text sentiment to public opinion time series. AAAI Publications
Bond RM, Fariss CJ, Jones JJ, Kramer AD, Marlow C, Settle JE, Fowler JH (2012) A 61-million-person experiment in social influence and political mobilization. Nature 489(7415):295–298
DiGrazia J, McKelvey K, Bollen J, Rojas F (2013) More tweets, more votes: social media as a quantitative indicator of political behavior. PLoS ONE 8(11):79449
Burnap P, Gibson R, Sloan L, Southern R, Williams M (2016) 140 characters to victory?: using Twitter to predict the UK 2015 general election. Elect Stud 41:230–233. https://doi.org/10.1016/j.electstud.2015.11.017
Bovet A, Makse HA (2019) Influence of fake news in Twitter during the 2016 US presidential election. Nat Commun 10(1):1–14
Dimitrova DV, Matthes J (2018) Social media in political campaigning around the world: theoretical and methodological challenges. Sage, Los Angeles
Grusell M, Nord L (2020) Setting the trend or changing the game? Professionalization and digitalization of election campaigns in Sweden. J Polit Mark 19(3):258–278. https://doi.org/10.1080/15377857.2016.1228555
Kohut A, Keeter S, Doherty C, Dimock M, Christian L (2012) Assessing the representativeness of public opinion surveys. Pew Research Center, Washington
Barberá P, Rivero G (2015) Understanding the political representativeness of Twitter users. Soc Sci Comput Rev 33(6):712–729
ITU (2021) Measuring digital development: facts and figs. 2021. https://www.itu.int/itu-d/reports/statistics/facts-figures-2021/. Online; accessed 26-May-2022
Perrin A, Atske S (2021) 7% of Americans don’t use the internet. Who are they? https://www.pewresearch.org/fact-tank/2021/04/02/7-of-americans-dont-use-the-internet-who-are-they/. Online; last modified 02-April-2021
Nishida R (2018) Politics armed with information. Kadokawa
Wojcik S, Hughes A (2019) Sizing up Twitter users. https://www.pewresearch.org/internet/2019/04/24/sizing-up-twitter-users/. Online; last modified 24-April-2019
Conover MD, Gonçalves B, Flammini A, Menczer F (2012) Partisan asymmetries in online political activity. EPJ Data Sci 1:6
Mussi Reyero T, Beiró MG, Alvarez-Hamelin JI, Hernández L, Kotzinos D (2021) Evolution of the political opinion landscape during electoral periods. EPJ Data Sci 10(1):31. https://doi.org/10.1140/EPJDS/S13688-021-00285-8
Alizadeh M, Shapiro JN, Buntain C, Tucker JA (2020) Content-based features predict social media influence operations. Sci Adv 6(30):5824
Mosleh M, Pennycook G, Arechar AA, Rand DG (2021) Cognitive reflection correlates with behavior on Twitter. Nat Commun 12:921. https://doi.org/10.1038/s41467-020-20043-0
Armstrong C, Zook M, Ruths D, Soehl T (2021) Challenges when identifying migration from geo-located Twitter data. https://doi.org/10.1140/epjds/s13688-020-00254-7
Jing E, Ahn YY (2021) Characterizing partisan political narrative frameworks about COVID-19 on Twitter. EPJ Data Sci 10(1):53. https://doi.org/10.1140/EPJDS/S13688-021-00308-4. arXiv:2103.06960
Wang J, Fan Y, Palacios J, Chai Y, Guetta-Jeanrenaud N, Obradovich N, Zhou C, Zheng S (2022) Global evidence of expressed sentiment alterations during the COVID-19 pandemic. Nat Hum Behav 6(3):349–358
Flores-Saviaga C, Savage S (2021) Fighting disaster misinformation in Latin America: the# 19s Mexican earthquake case study. Pers Ubiquitous Comput 25:353–373
García-Tejeda E, Fondevila G, Siordia OS (2021) Spatial analysis of gunshot reports on Twitter in Mexico city. ISPRS Intl J Geo-Inf 10(8):540
Grinberg N, Joseph K, Friedland L, Swire-Thompson B, Lazer D (2019) Fake news on Twitter during the 2016 US presidential election. Science 363(6425):374–378
Bright J, Hale S, Ganesh B, Bulovsky A, Margetts H, Howard P (2020) Does campaigning on social media make a difference? Evidence from candidate use of Twitter during the 2015 and 2017 U.K. elections. Commun Res 47(7):988–1009. https://doi.org/10.1177/0093650219872394
Barberá P, Jost JT, Nagler J, Tucker JA, Bonneau R (2015) Tweeting from left to right: is online political communication more than an echo chamber? Psychol Sci 26(10):1531–1542. https://doi.org/10.1177/0956797615594620. PMID: 26297377
Bovet A, Morone F, Makse HA (2018) Validation of Twitter opinion trends with national polling aggregates: Hillary Clinton vs Donald Trump. Sci Rep 8(1):1–16
Khan A, Zhang H, Boudjellal N, Ahmad A, Shang J, Dai L, Hayat B (2021) Election prediction on Twitter: a systematic mapping study. Complexity 2021:5565434
Zhenkun Z, Matteo S, Luciano C, Guido C, Makse HA (2021) Why polls fail to predict elections. J Big Data 8:137
Chauhan P, Sharma N, Sikka G (2021) The emergence of social media data and sentiment analysis in election prediction. J Ambient Intell Humaniz Comput 12:2601–2627
Brito KDS, Filho RLCS, Adeodato PJL (2021) A systematic review of predicting elections based on social media data: research challenges and future directions. IEEE Trans Comput Soc Syst 8(4):819–843. https://doi.org/10.1109/TCSS.2021.3063660
Santos JS, Bernardini F, Paes A (2021) A survey on the use of data and opinion mining in social media to political electoral outcomes prediction. Soc Netw Anal Min 11:1–39
Oraculus (2021) Elección para la Cámara de Diputados 2021. https://oraculus.mx/diputados2021/. Online; last modified 02-June-2021
INE (2021) Cómputos Distritales 2021 Elecciones Federales. https://computos2021.ine.mx/votos-ppyci/grafica. Online; last modified 11-June-2021
Summers E, Brigadir I, Hames S, van Kemenade H, Binkley P, tinafigueroa, Ruest N, Walmir, Chudnov D, recrm, celeste, Lin H, Chosak A, McCain RM, Milligan I, Segerberg A, Shahrokhian D, Walsh M, Lausen L, Woodward N, Münch FV, eggplants, Ramaswami A, Hereñú D, Milajevs D, Elwert F, Westerling K, rongpenl, Costa S, Shawn (2022) DocNow/twarc: v2.10.4. Zenodo. https://doi.org/10.5281/zenodo.6503180
Vigna-Gomez A (2022) Dataset from: design and analysis of tweet-based election models for the 2021 Mexican legislative election. Zenodo. https://doi.org/10.5281/zenodo.7877001
Bird S, Klein E, Loper E (2009) Natural language processing with python: analyzing text with the natural language toolkit. O’Reilly Media, Inc.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Canete J, Chaperon G, Fuentes R, Ho J-H, Kang H, Pérez J (2020) Spanish pre-trained bert model and evaluation data. Pml4dc at iclr
Gaurav M, Srivastava A, Kumar A, Miller S (2013) Leveraging candidate popularity on Twitter to predict election outcome
Hargittai E, Karaoglu G (2018) Biases of online political polls: who participates? Socius 4:2378023118791080
INEGI (2020) Censo de Población y Vivienda 2020. https://www.inegi.org.mx/programas/ccpv/2020/. Online; last modified 16-March-2021
INEGI (2020) Encuesta Nacional sobre Disponibilidad y Uso de Tecnologías de la Información en los Hogares (ENDUTIH) 2020. https://www.inegi.org.mx/programas/dutih/2020/. Online; last modified 22-June-2021
Delkic M (2018) What it takes to make 2.8 million calls to voters. The New York Times. Online; accessed 14-Oct-2022
Cohn N Who in the world is still answering pollsters’. phone calls? The New York Times (2022). Online; accessed 14-Oct-2022
Holbrook AL, Krosnick JA (2010) Social desirability bias in voter turnout reports: tests using the item count technique. Public Opin Q 74(1):37–67
Buskirk TD, Blakely BP, Eck A, Mcgrath R, Singh R, Yu Y Sweet tweets! Evaluating a new approach for probability-based sampling of Twitter. EPJ Data Sci https://doi.org/10.1140/epjds/s13688-022-00321-1
Crowne DP, Marlowe D (1960) A new scale of social desirability independent of psychopathology. J Consult Clin Psychol 24(4):349
Fisher RJ (1993) Social desirability bias and the validity of indirect questioning. J Consum Res 20(2):303–315
Silver BD, Anderson BA, Abramson PR (1986) Who overreports voting? Am Polit Sci Rev 80(2):613–624
Petutschnig A, Resch B, Lang S, Havas C (2021) Evaluating the representativeness of socio-demographic variables over time for geo-social media data. ISPRS Intl J Geo-Inf 10(5):323. https://doi.org/10.3390/ijgi10050323
Kobayashi T (2007) Socialization of Internet use and its political implications. In: Political reality and social psychology: dynamics of heisei koizumi politics, pp 229–263
Yoshida M, Sakaki T, Kobayashi T, Toriumi F (2021) Japanese conservative messages propagate to moderate users better than their liberal counterparts on Twitter. Sci Rep 11(1):1–9
Howard PN, Savage S, Saviaga CF, Toxtli C, Monroy-Hernández A (2016) Social media, civic engagement, and the slacktivism hypothesis: lessons from Mexico’s “el bronco”. J Int Aff 70(1):55–73
Flores-Saviaga C, Feng S, Savage S (2022) Datavoidant: an ai system for addressing political data voids on social media. In: Proceedings of the ACM on human-computer interaction 6 (CSCW2), pp 1–29
Woolley SC (2016) Automating power: social bot interference in global politics. First Monday 21(4). https://doi.org/10.5210/fm.v21i4.6161
Varol O, Ferrara E, Davis C, Menczer F, Flammini A (2017) Online human-bot interactions: detection, estimation, and characterization. Proc Int AAAI Conf Web Soc Media 11:280–289
Rodríguez-Ruiz J, Mata-Sánchez JI, Monroy R, Loyola-González O, López-Cuevas A (2020) A one-class classification approach for bot detection on Twitter. Comput Secur 91:101715. https://doi.org/10.1016/j.cose.2020.101715
Forelle M, Howard P, Monroy-Hernández A, Savage S (2015) Political bots and the manipulation of public opinion in venezuela. arXiv preprint. arXiv:1507.07109
Bruno M, Lambiotte R, Saracco F (2022) Brexit and bots: characterizing the behaviour of automated accounts on Twitter during the UK election. https://doi.org/10.1140/epjds/s13688-022-00330-0
Caldarelli G, De Nicola R, Del Vigna F, Petrocchi M, Saracco F (2020) The role of bot squads in the political propaganda on Twitter. Commun Phys 3(1):1–15
González-Bailón S, De Domenico M (2021) Bots are less central than verified accounts during contentious political events. Proc Natl Acad Sci 118(11):2013443118
Karpf D (2012) The MoveOn effect: the unexpected transformation of American political advocacy. Oxford University Press, London. https://doi.org/10.1093/acprof:oso/9780199898367.001.0001
Savage S, Monroy-Hernández A (2015) Participatory militias: an analysis of an armed movement’s online audience. In: Proceedings of the 18th ACM conference on computer supported cooperative work & social computing, pp 724–733
Brito K, Adeodato PJL (2023) Machine learning for predicting elections in Latin America based on social media engagement and polls. Gov Inf Q 40(1):101782
Radicioni T, Saracco F, Pavan E, Squartini T (2021) Analysing Twitter semantic networks: the case of 2018 Italian elections. Sci Rep 11(1):1–22