Design and analysis of tweet-based election models for the 2021 Mexican legislative election

Springer Science and Business Media LLC - Tập 12 - Trang 1-17 - 2023
Alejandro Vigna-Gómez1,2,3, Javier Murillo2,4, Manelik Ramirez5,4, Alberto Borbolla4, Ian Márquez6,4, Prasun K. Ray7
1Niels Bohr International Academy, Niels Bohr Institute, Copenhagen, Denmark
2The Aspen Institute México, Mexico City, Mexico
3Max-Planck Institut für Astrophysik, Garching, Germany
4Ciencia de Datos & Tecnología, Estado de México, Mexico
5Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City, Mexico
6Facultad de Negocios, Universidad La Salle México, Mexico City, Mexico
7Department of Mathematics, Imperial College London, London, United Kingdom

Tóm tắt

Modelling and forecasting real-life human behaviour using online social media is an active endeavour of interest in politics, government, academia, and industry. Since its creation in 2006, Twitter has been proposed as a potential laboratory that could be used to gauge and predict social behaviour. During the last decade, the user base of Twitter has been growing and becoming more representative of the general population. Here we analyse this user base in the context of the 2021 Mexican Legislative Election. To do so, we use a dataset of 15 million election-related tweets in the six months preceding election day. We explore different election models that assign political preference to either the ruling parties or the opposition. We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods. These results demonstrate that analysis of public online data can outperform conventional polling methods, and that political analysis and general forecasting would likely benefit from incorporating such data in the immediate future. Moreover, the same Twitter dataset with geographical attributes is positively correlated with results from official census data on population and internet usage in Mexico. These findings suggest that we have reached a period in time when online activity, appropriately curated, can provide an accurate representation of offline behaviour.

Tài liệu tham khảo

Tankard JW Jr (1972) Public opinion polling by newspapers in the presidential election campaign of 1824. Journal Mass Commun Q 49(2):361–365 Tumasjan A, Sprenger T, Sandner P, Welpe I (2010) Predicting elections with Twitter: what 140 characters reveal about political sentiment. In: Proceedings of the international AAAI conference on web and social media, vol 4, pp 178–185. https://doi.org/10.1609/icwsm.v4i1.14009 O’Connor B, Balasubramanyan R, Routledge B, Smith N (2010) From tweets to polls: linking text sentiment to public opinion time series. AAAI Publications Bond RM, Fariss CJ, Jones JJ, Kramer AD, Marlow C, Settle JE, Fowler JH (2012) A 61-million-person experiment in social influence and political mobilization. Nature 489(7415):295–298 DiGrazia J, McKelvey K, Bollen J, Rojas F (2013) More tweets, more votes: social media as a quantitative indicator of political behavior. PLoS ONE 8(11):79449 Burnap P, Gibson R, Sloan L, Southern R, Williams M (2016) 140 characters to victory?: using Twitter to predict the UK 2015 general election. Elect Stud 41:230–233. https://doi.org/10.1016/j.electstud.2015.11.017 Bovet A, Makse HA (2019) Influence of fake news in Twitter during the 2016 US presidential election. Nat Commun 10(1):1–14 Dimitrova DV, Matthes J (2018) Social media in political campaigning around the world: theoretical and methodological challenges. Sage, Los Angeles Grusell M, Nord L (2020) Setting the trend or changing the game? Professionalization and digitalization of election campaigns in Sweden. J Polit Mark 19(3):258–278. https://doi.org/10.1080/15377857.2016.1228555 Kohut A, Keeter S, Doherty C, Dimock M, Christian L (2012) Assessing the representativeness of public opinion surveys. Pew Research Center, Washington Barberá P, Rivero G (2015) Understanding the political representativeness of Twitter users. Soc Sci Comput Rev 33(6):712–729 ITU (2021) Measuring digital development: facts and figs. 2021. https://www.itu.int/itu-d/reports/statistics/facts-figures-2021/. Online; accessed 26-May-2022 Perrin A, Atske S (2021) 7% of Americans don’t use the internet. Who are they? https://www.pewresearch.org/fact-tank/2021/04/02/7-of-americans-dont-use-the-internet-who-are-they/. Online; last modified 02-April-2021 Nishida R (2018) Politics armed with information. Kadokawa Wojcik S, Hughes A (2019) Sizing up Twitter users. https://www.pewresearch.org/internet/2019/04/24/sizing-up-twitter-users/. Online; last modified 24-April-2019 Conover MD, Gonçalves B, Flammini A, Menczer F (2012) Partisan asymmetries in online political activity. EPJ Data Sci 1:6 Mussi Reyero T, Beiró MG, Alvarez-Hamelin JI, Hernández L, Kotzinos D (2021) Evolution of the political opinion landscape during electoral periods. EPJ Data Sci 10(1):31. https://doi.org/10.1140/EPJDS/S13688-021-00285-8 Alizadeh M, Shapiro JN, Buntain C, Tucker JA (2020) Content-based features predict social media influence operations. Sci Adv 6(30):5824 Mosleh M, Pennycook G, Arechar AA, Rand DG (2021) Cognitive reflection correlates with behavior on Twitter. Nat Commun 12:921. https://doi.org/10.1038/s41467-020-20043-0 Armstrong C, Zook M, Ruths D, Soehl T (2021) Challenges when identifying migration from geo-located Twitter data. https://doi.org/10.1140/epjds/s13688-020-00254-7 Jing E, Ahn YY (2021) Characterizing partisan political narrative frameworks about COVID-19 on Twitter. EPJ Data Sci 10(1):53. https://doi.org/10.1140/EPJDS/S13688-021-00308-4. arXiv:2103.06960 Wang J, Fan Y, Palacios J, Chai Y, Guetta-Jeanrenaud N, Obradovich N, Zhou C, Zheng S (2022) Global evidence of expressed sentiment alterations during the COVID-19 pandemic. Nat Hum Behav 6(3):349–358 Flores-Saviaga C, Savage S (2021) Fighting disaster misinformation in Latin America: the# 19s Mexican earthquake case study. Pers Ubiquitous Comput 25:353–373 García-Tejeda E, Fondevila G, Siordia OS (2021) Spatial analysis of gunshot reports on Twitter in Mexico city. ISPRS Intl J Geo-Inf 10(8):540 Grinberg N, Joseph K, Friedland L, Swire-Thompson B, Lazer D (2019) Fake news on Twitter during the 2016 US presidential election. Science 363(6425):374–378 Bright J, Hale S, Ganesh B, Bulovsky A, Margetts H, Howard P (2020) Does campaigning on social media make a difference? Evidence from candidate use of Twitter during the 2015 and 2017 U.K. elections. Commun Res 47(7):988–1009. https://doi.org/10.1177/0093650219872394 Barberá P, Jost JT, Nagler J, Tucker JA, Bonneau R (2015) Tweeting from left to right: is online political communication more than an echo chamber? Psychol Sci 26(10):1531–1542. https://doi.org/10.1177/0956797615594620. PMID: 26297377 Bovet A, Morone F, Makse HA (2018) Validation of Twitter opinion trends with national polling aggregates: Hillary Clinton vs Donald Trump. Sci Rep 8(1):1–16 Khan A, Zhang H, Boudjellal N, Ahmad A, Shang J, Dai L, Hayat B (2021) Election prediction on Twitter: a systematic mapping study. Complexity 2021:5565434 Zhenkun Z, Matteo S, Luciano C, Guido C, Makse HA (2021) Why polls fail to predict elections. J Big Data 8:137 Chauhan P, Sharma N, Sikka G (2021) The emergence of social media data and sentiment analysis in election prediction. J Ambient Intell Humaniz Comput 12:2601–2627 Brito KDS, Filho RLCS, Adeodato PJL (2021) A systematic review of predicting elections based on social media data: research challenges and future directions. IEEE Trans Comput Soc Syst 8(4):819–843. https://doi.org/10.1109/TCSS.2021.3063660 Santos JS, Bernardini F, Paes A (2021) A survey on the use of data and opinion mining in social media to political electoral outcomes prediction. Soc Netw Anal Min 11:1–39 Oraculus (2021) Elección para la Cámara de Diputados 2021. https://oraculus.mx/diputados2021/. Online; last modified 02-June-2021 INE (2021) Cómputos Distritales 2021 Elecciones Federales. https://computos2021.ine.mx/votos-ppyci/grafica. Online; last modified 11-June-2021 Summers E, Brigadir I, Hames S, van Kemenade H, Binkley P, tinafigueroa, Ruest N, Walmir, Chudnov D, recrm, celeste, Lin H, Chosak A, McCain RM, Milligan I, Segerberg A, Shahrokhian D, Walsh M, Lausen L, Woodward N, Münch FV, eggplants, Ramaswami A, Hereñú D, Milajevs D, Elwert F, Westerling K, rongpenl, Costa S, Shawn (2022) DocNow/twarc: v2.10.4. Zenodo. https://doi.org/10.5281/zenodo.6503180 Vigna-Gomez A (2022) Dataset from: design and analysis of tweet-based election models for the 2021 Mexican legislative election. Zenodo. https://doi.org/10.5281/zenodo.7877001 Bird S, Klein E, Loper E (2009) Natural language processing with python: analyzing text with the natural language toolkit. O’Reilly Media, Inc. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830 Canete J, Chaperon G, Fuentes R, Ho J-H, Kang H, Pérez J (2020) Spanish pre-trained bert model and evaluation data. Pml4dc at iclr Gaurav M, Srivastava A, Kumar A, Miller S (2013) Leveraging candidate popularity on Twitter to predict election outcome Hargittai E, Karaoglu G (2018) Biases of online political polls: who participates? Socius 4:2378023118791080 INEGI (2020) Censo de Población y Vivienda 2020. https://www.inegi.org.mx/programas/ccpv/2020/. Online; last modified 16-March-2021 INEGI (2020) Encuesta Nacional sobre Disponibilidad y Uso de Tecnologías de la Información en los Hogares (ENDUTIH) 2020. https://www.inegi.org.mx/programas/dutih/2020/. Online; last modified 22-June-2021 Delkic M (2018) What it takes to make 2.8 million calls to voters. The New York Times. Online; accessed 14-Oct-2022 Cohn N Who in the world is still answering pollsters’. phone calls? The New York Times (2022). Online; accessed 14-Oct-2022 Holbrook AL, Krosnick JA (2010) Social desirability bias in voter turnout reports: tests using the item count technique. Public Opin Q 74(1):37–67 Buskirk TD, Blakely BP, Eck A, Mcgrath R, Singh R, Yu Y Sweet tweets! Evaluating a new approach for probability-based sampling of Twitter. EPJ Data Sci https://doi.org/10.1140/epjds/s13688-022-00321-1 Crowne DP, Marlowe D (1960) A new scale of social desirability independent of psychopathology. J Consult Clin Psychol 24(4):349 Fisher RJ (1993) Social desirability bias and the validity of indirect questioning. J Consum Res 20(2):303–315 Silver BD, Anderson BA, Abramson PR (1986) Who overreports voting? Am Polit Sci Rev 80(2):613–624 Petutschnig A, Resch B, Lang S, Havas C (2021) Evaluating the representativeness of socio-demographic variables over time for geo-social media data. ISPRS Intl J Geo-Inf 10(5):323. https://doi.org/10.3390/ijgi10050323 Kobayashi T (2007) Socialization of Internet use and its political implications. In: Political reality and social psychology: dynamics of heisei koizumi politics, pp 229–263 Yoshida M, Sakaki T, Kobayashi T, Toriumi F (2021) Japanese conservative messages propagate to moderate users better than their liberal counterparts on Twitter. Sci Rep 11(1):1–9 Howard PN, Savage S, Saviaga CF, Toxtli C, Monroy-Hernández A (2016) Social media, civic engagement, and the slacktivism hypothesis: lessons from Mexico’s “el bronco”. J Int Aff 70(1):55–73 Flores-Saviaga C, Feng S, Savage S (2022) Datavoidant: an ai system for addressing political data voids on social media. In: Proceedings of the ACM on human-computer interaction 6 (CSCW2), pp 1–29 Woolley SC (2016) Automating power: social bot interference in global politics. First Monday 21(4). https://doi.org/10.5210/fm.v21i4.6161 Varol O, Ferrara E, Davis C, Menczer F, Flammini A (2017) Online human-bot interactions: detection, estimation, and characterization. Proc Int AAAI Conf Web Soc Media 11:280–289 Rodríguez-Ruiz J, Mata-Sánchez JI, Monroy R, Loyola-González O, López-Cuevas A (2020) A one-class classification approach for bot detection on Twitter. Comput Secur 91:101715. https://doi.org/10.1016/j.cose.2020.101715 Forelle M, Howard P, Monroy-Hernández A, Savage S (2015) Political bots and the manipulation of public opinion in venezuela. arXiv preprint. arXiv:1507.07109 Bruno M, Lambiotte R, Saracco F (2022) Brexit and bots: characterizing the behaviour of automated accounts on Twitter during the UK election. https://doi.org/10.1140/epjds/s13688-022-00330-0 Caldarelli G, De Nicola R, Del Vigna F, Petrocchi M, Saracco F (2020) The role of bot squads in the political propaganda on Twitter. Commun Phys 3(1):1–15 González-Bailón S, De Domenico M (2021) Bots are less central than verified accounts during contentious political events. Proc Natl Acad Sci 118(11):2013443118 Karpf D (2012) The MoveOn effect: the unexpected transformation of American political advocacy. Oxford University Press, London. https://doi.org/10.1093/acprof:oso/9780199898367.001.0001 Savage S, Monroy-Hernández A (2015) Participatory militias: an analysis of an armed movement’s online audience. In: Proceedings of the 18th ACM conference on computer supported cooperative work & social computing, pp 724–733 Brito K, Adeodato PJL (2023) Machine learning for predicting elections in Latin America based on social media engagement and polls. Gov Inf Q 40(1):101782 Radicioni T, Saracco F, Pavan E, Squartini T (2021) Analysing Twitter semantic networks: the case of 2018 Italian elections. Sci Rep 11(1):1–22