Concerns regarding the reliability of traditional polling methods have grown in recent years, prompting investigations into alternative approaches for gauging public opinion. This article delves into the effectiveness of using Twitter data to predict the outcomes of Mexican legislative elections. By analyzing Twitter activity and comparing it with official election results and census data, we demonstrate that models leveraging geolocated Twitter data offer a more accurate and precise prediction of Mexican elections than conventional polling techniques.
Twitter as a Polling Alternative and Open-Source Intelligence
Traditional phone-based polls face increasing challenges in maintaining accuracy. Our study explores the potential of archival Twitter data to reflect real-world political preferences and regional engagement. We discovered that bipartisan models of Mexican elections utilizing geodata surpass the accuracy and precision of conventional polling methods, including sophisticated Bayesian modeling of aggregated polls. This finding is supported by previous research indicating that Twitter data not only aligns with national polling aggregates but can even precede them by several days, highlighting the real-time power of social media in capturing public sentiment.
Online polling offers undeniable advantages in terms of speed, cost-effectiveness, and reach. However, our approach differs from direct polling. Instead of directly soliciting opinions, we construct contextual queries around relevant topics and analyze the retrieved data, mirroring open-source intelligence (OSINT) methodologies. A primary criticism of using online resources like Twitter for prediction is the inherent biases within the data. OSINT-like methods can mitigate some biases associated with traditional polling, such as social desirability bias, where respondents may skew answers to appear more socially acceptable.
Nevertheless, both conventional polling and OSINT-based methods share systematic uncertainties. Over-reporting, where individuals participate in polls or online discussions without actually voting, is one such issue. Furthermore, a simplistic volumetric approach, directly correlating tweet volume with vote counts, has shown inconsistent predictive success. Our study incorporates models that estimate vote share based on the inferred voting intent of unique accounts, thereby weighting the average opinion of both highly active and less active users equally.
Demographic biases in social media usage are also a subject of ongoing discussion, particularly in Western contexts. While internet usage demographics vary across regions, the assumption of ideological segregation in social media may be overstated. Studies indicate potential underrepresentation of densely populated areas in non-spatial models and a correlation between cognitive reflection and online behavior, potentially linked to socio-demographic factors. While representativeness in online data has been explored in various countries, including Japan and Mexico, further research is needed to understand the nuances in different socio-political contexts.
Fig 1: Distribution of allegiances in Twitter data, highlighting the prevalence of near-zero allegiances, suggesting predominantly negative sentiment towards parties.
The Impact of Bots on Election Prediction Models
Automated accounts, or bots, are a significant concern in the Twitter landscape. While Twitter’s internal estimates suggest a bot presence of around 5%, academic studies indicate potentially higher figures, reaching up to 15%. Distinguishing bot-like behavior and accounts remains complex. It’s important to note that not all bots are malicious or involved in political manipulation. While some bots participate in political discourse, they tend to be distributed across the political spectrum, primarily amplifying content rather than creating partisan followings. Evidence suggests that verified accounts may play a more central role than bots during politically charged events.
Our analysis did not specifically incorporate bot detection or filtering. However, our methodology of focusing on unique users and downplaying extreme positive or negative allegiances indirectly minimizes the influence of bots. These strategies are embedded within our alternative prediction model.
Fig 2: Comparison of model performance using complete versus geolocated Twitter data, demonstrating a significant performance gap and suggesting geodata’s enhanced representativeness.
Literature Comparison and Contextualizing Mexican Election Analysis
The role of social media in elections has been a prominent research area for over a decade. Much of this research has concentrated on digitally advanced economies like the United States, the United Kingdom, and Germany. Early studies in Latin America explored Twitter’s predictive power in Venezuelan, Paraguayan, and Ecuadorian elections through volumetric analysis. In Mexico, research has touched upon the intersection of politics and social media in areas like militias, civic engagement, and political disinformation. However, quantitative election predictions and in-depth analyses for Mexico remain limited in existing literature.
A recent machine learning approach to predict Latin American presidential elections in 2018, including Mexico, encountered challenges. While successful in Argentina, Brazil, and Colombia, the model under-predicted the winning candidate in Mexico by over 10 percentage points. The study suggested data scarcity as a potential factor. While they collected a substantial number of social media posts, our study, focusing on Mexican elections, utilizes a significantly larger dataset, particularly with geolocated data. Furthermore, their approach combined data from various social media platforms, which, while intended to homogenize the sample, may introduce uncertainties and complexities in data interpretation.
Methodological Considerations in Social Media Election Modeling
Data acquisition is arguably the most challenging aspect of our analysis. It requires an iterative process of keyword query generation, data analysis, and query refinement to filter out irrelevant tweets. Restricting our search to Spanish-language tweets proved crucial in minimizing noise and focusing on election-related content.
We opted for a bipartisan analysis instead of focusing on individual parties in this study. In multi-party systems with coalitions, analyzing tweets can be complex due to mentions of multiple parties and coalitions within single tweets. A bipartisan approach allowed us to effectively account for coalitions and create a straightforward framework for comparing various election models. While multi-party election studies are common, often focusing on dominant parties, future research in countries like Mexico should explore multi-party models and compare them to simpler models like ours.
References
[1] Delkic M (2018) What it takes to make 2.8 million calls to voters. The New York Times. Online; accessed 14-Oct-2022
[2] Cohn N Who in the world is still answering pollsters’. phone calls? The New York Times (2022). Online; accessed 14-Oct-2022
[3] Oraculus (2021) Elección para la Cámara de Diputados 2021. https://oraculus.mx/diputados2021/ . Online; last modified 02-June-2021
[4] Bovet A, Morone F, Makse HA (2018) Validation of Twitter opinion trends with national polling aggregates: Hillary Clinton vs Donald Trump. Sci Rep 8(1):1–16
[5] Hargittai E, Karaoglu G (2018) Biases of online political polls: who participates? Socius 4:2378023118791080
[6] Zhenkun Z, Matteo S, Luciano C, Guido C, Makse HA (2021) Why polls fail to predict elections. J Big Data 8:137
[7] Crowne DP, Marlowe D (1960) A new scale of social desirability independent of psychopathology. J Consult Clin Psychol 24(4):349
[8] Fisher RJ (1993) Social desirability bias and the validity of indirect questioning. J Consum Res 20(2):303–315
[9] Silver BD, Anderson BA, Abramson PR (1986) Who overreports voting? Am Polit Sci Rev 80(2):613–624
[10] DiGrazia J, McKelvey K, Bollen J, Rojas F (2013) More tweets, more votes: social media as a quantitative indicator of political behavior. PLoS ONE 8(11):79449
[11] Perrin A, Atske S (2021) 7% of Americans don’t use the internet. Who are they? https://www.pewresearch.org/fact-tank/2021/04/02/7-of-americans-dont-use-the-internet-who-are-they/ . Online; last modified 02-April-2021
[12] Barberá P, Jost JT, Nagler J, Tucker JA, Bonneau R (2015) Tweeting from left to right: is online political communication more than an echo chamber? Psychol Sci 26(10):1531–1542. https://doi.org/10.1177/0956797615594620 . PMID: 26297377
[13] Petutschnig A, Resch B, Lang S, Havas C (2021) Evaluating the representativeness of socio-demographic variables over time for geo-social media data. ISPRS Intl J Geo-Inf 10(5):323. https://doi.org/10.3390/ijgi10050323
[14] Mosleh M, Pennycook G, Arechar AA, Rand DG (2021) Cognitive reflection correlates with behavior on Twitter. Nat Commun 12:921. https://doi.org/10.1038/s41467-020-20043-0
[15] Kobayashi T (2007) Socialization of Internet use and its political implications. In: Political reality and social psychology: dynamics of heisei koizumi politics, pp 229–263
[16] Nishida R (2018) Politics armed with information. Kadokawa
[17] Yoshida M, Sakaki T, Kobayashi T, Toriumi F (2021) Japanese conservative messages propagate to moderate users better than their liberal counterparts on Twitter. Sci Rep 11(1):1–9
[18] Howard PN, Savage S, Saviaga CF, Toxtli C, Monroy-Hernández A (2016) Social media, civic engagement, and the slacktivism hypothesis: lessons from Mexico’s “el bronco”. J Int Aff 70(1):55–73
[19] Flores-Saviaga C, Feng S, Savage S (2022) Datavoidant: an ai system for addressing political data voids on social media. In: Proceedings of the ACM on human-computer interaction 6 (CSCW2), pp 1–29
[20] Woolley SC (2016) Automating power: social bot interference in global politics. First Monday 21(4). https://doi.org/10.5210/fm.v21i4.6161
[21] Varol O, Ferrara E, Davis C, Menczer F, Flammini A (2017) Online human-bot interactions: detection, estimation, and characterization. Proc Int AAAI Conf Web Soc Media 11:280–289
[22] Rodríguez-Ruiz J, Mata-Sánchez JI, Monroy R, Loyola-González O, López-Cuevas A (2020) A one-class classification approach for bot detection on Twitter. Comput Secur 91:101715. https://doi.org/10.1016/j.cose.2020.101715
[23] Forelle M, Howard P, Monroy-Hernández A, Savage S (2015) Political bots and the manipulation of public opinion in venezuela. arXiv preprint. arXiv:1507.07109
[24] Bruno M, Lambiotte R, Saracco F (2022) Brexit and bots: characterizing the behaviour of automated accounts on Twitter during the UK election. https://doi.org/10.1140/epjds/s13688-022-00330-0
[25] Caldarelli G, De Nicola R, Del Vigna F, Petrocchi M, Saracco F (2020) The role of bot squads in the political propaganda on Twitter. Commun Phys 3(1):1–15
[26] González-Bailón S, De Domenico M (2021) Bots are less central than verified accounts during contentious political events. Proc Natl Acad Sci 118(11):2013443118
[27] O’Connor B, Balasubramanyan R, Routledge B, Smith N (2010) From tweets to polls: linking text sentiment to public opinion time series. AAAI Publications
[28] Bond RM, Fariss CJ, Jones JJ, Kramer AD, Marlow C, Settle JE, Fowler JH (2012) A 61-million-person experiment in social influence and political mobilization. Nature 489(7415):295–298
[29] Karpf D (2012) The MoveOn effect: the unexpected transformation of American political advocacy. Oxford University Press, London. https://doi.org/10.1093/acprof:oso/9780199898367.001.0001
[30] Burnap P, Gibson R, Sloan L, Southern R, Williams M (2016) 140 characters to victory?: using Twitter to predict the UK 2015 general election. Elect Stud 41:230–233. https://doi.org/10.1016/j.electstud.2015.11.017
[31] Tumasjan A, Sprenger T, Sandner P, Welpe I (2010) Predicting elections with Twitter: what 140 characters reveal about political sentiment. In: Proceedings of the international AAAI conference on web and social media, vol 4, pp 178–185. https://doi.org/10.1609/icwsm.v4i1.14009
[32] Gaurav M, Srivastava A, Kumar A, Miller S (2013) Leveraging candidate popularity on Twitter to predict election outcome
[33] Savage S, Monroy-Hernández A (2015) Participatory militias: an analysis of an armed movement’s online audience. In: Proceedings of the 18th ACM conference on computer supported cooperative work & social computing, pp 724–733
[34] Brito K, Adeodato PJL (2023) Machine learning for predicting elections in Latin America based on social media engagement and polls. Gov Inf Q 40(1):101782
[35] Holbrook AL, Krosnick JA (2010) Social desirability bias in voter turnout reports: tests using the item count technique. Public Opin Q 74(1):37–67
[36] Buskirk TD, Blakely BP, Eck A, Mcgrath R, Singh R, Yu Y Sweet tweets! Evaluating a new approach for probability-based sampling of Twitter. EPJ Data Sci https://doi.org/10.1140/epjds/s13688-022-00321-1
[37] Kobayashi T (2007) Socialization of Internet use and its political implications. In: Political reality and social psychology: dynamics of heisei koizumi politics, pp 229–263
[38] Perrin A, Atske S (2021) 7% of Americans don’t use the internet. Who are they? https://www.pewresearch.org/fact-tank/2021/04/02/7-of-americans-dont-use-the-internet-who-are-they/ . Online; last modified 02-April-2021
[39] DiGrazia J, McKelvey K, Bollen J, Rojas F (2013) More tweets, more votes: social media as a quantitative indicator of political behavior. PLoS ONE 8(11):79449
[40] Bovet A, Morone F, Makse HA (2018) Validation of Twitter opinion trends with national polling aggregates: Hillary Clinton vs Donald Trump. Sci Rep 8(1):1–16
[41] Oraculus (2021) Elección para la Cámara de Diputados 2021. https://oraculus.mx/diputados2021/ . Online; last modified 02-June-2021
[42] Cohn N Who in the world is still answering pollsters’. phone calls? The New York Times (2022). Online; accessed 14-Oct-2022
[43] Delkic M (2018) What it takes to make 2.8 million calls to voters. The New York Times. Online; accessed 14-Oct-2022
[44] Tumasjan A, Sprenger T, Sandner P, Welpe I (2010) Predicting elections with Twitter: what 140 characters reveal about political sentiment. In: Proceedings of the international AAAI conference on web and social media, vol 4, pp 178–185. https://doi.org/10.1609/icwsm.v4i1.14009
[45] Burnap P, Gibson R, Sloan L, Southern R, Williams M (2016) 140 characters to victory?: using Twitter to predict the UK 2015 general election. Elect Stud 41:230–233. https://doi.org/10.1016/j.electstud.2015.11.017
[46] Karpf D (2012) The MoveOn effect: the unexpected transformation of American political advocacy. Oxford University Press, London. https://doi.org/10.1093/acprof:oso/9780199898367.001.0001
[47] Bond RM, Fariss CJ, Jones JJ, Kramer AD, Marlow C, Settle JE, Fowler JH (2012) A 61-million-person experiment in social influence and political mobilization. Nature 489(7415):295–298
[48] O’Connor B, Balasubramanyan R, Routledge B, Smith N (2010) From tweets to polls: linking text sentiment to public opinion time series. AAAI Publications
[49] González-Bailón S, De Domenico M (2021) Bots are less central than verified accounts during contentious political events. Proc Natl Acad Sci 118(11):2013443118
[50] Caldarelli G, De Nicola R, Del Vigna F, Petrocchi M, Saracco F (2020) The role of bot squads in the political propaganda on Twitter. Commun Phys 3(1):1–15
[51] Bruno M, Lambiotte R, Saracco F (2022) Brexit and bots: characterizing the behaviour of automated accounts on Twitter during the UK election. https://doi.org/10.1140/epjds/s13688-022-00330-0
[52] Forelle M, Howard P, Monroy-Hernández A, Savage S (2015) Political bots and the manipulation of public opinion in venezuela. arXiv preprint. arXiv:1507.07109
[53] Rodríguez-Ruiz J, Mata-Sánchez JI, Monroy R, Loyola-González O, López-Cuevas A (2020) A one-class classification approach for bot detection on Twitter. Comput Secur 91:101715. https://doi.org/10.1016/j.cose.2020.101715
[54] Varol O, Ferrara E, Davis C, Menczer F, Flammini A (2017) Online human-bot interactions: detection, estimation, and characterization. Proc Int AAAI Conf Web Soc Media 11:280–289
[55] Woolley SC (2016) Automating power: social bot interference in global politics. First Monday 21(4). https://doi.org/10.5210/fm.v21i4.6161
[56] Flores-Saviaga C, Feng S, Savage S (2022) Datavoidant: an ai system for addressing political data voids on social media. In: Proceedings of the ACM on human-computer interaction 6 (CSCW2), pp 1–29
[57] Howard PN, Savage S, Saviaga CF, Toxtli C, Monroy-Hernández A (2016) Social media, civic engagement, and the slacktivism hypothesis: lessons from Mexico’s “el bronco”. J Int Aff 70(1):55–73
[58] Yoshida M, Sakaki T, Kobayashi T, Toriumi F (2021) Japanese conservative messages propagate to moderate users better than their liberal counterparts on Twitter. Sci Rep 11(1):1–9
[59] Nishida R (2018) Politics armed with information. Kadokawa
[60] Kobayashi T (2007) Socialization of Internet use and its political implications. In: Political reality and social psychology: dynamics of heisei koizumi politics, pp 229–263
[61] Mosleh M, Pennycook G, Arechar AA, Rand DG (2021) Cognitive reflection correlates with behavior on Twitter. Nat Commun 12:921. https://doi.org/10.1038/s41467-020-20043-0
[62] Petutschnig A, Resch B, Lang S, Havas C (2021) Evaluating the representativeness of socio-demographic variables over time for geo-social media data. ISPRS Intl J Geo-Inf 10(5):323. https://doi.org/10.3390/ijgi10050323
[63] Barberá P, Jost JT, Nagler J, Tucker JA, Bonneau R (2015) Tweeting from left to right: is online political communication more than an echo chamber? Psychol Sci 26(10):1531–1542. https://doi.org/10.1177/0956797615594620 . PMID: 26297377
[64] Perrin A, Atske S (2021) 7% of Americans don’t use the internet. Who are they? https://www.pewresearch.org/fact-tank/2021/04/02/7-of-americans-dont-use-the-internet-who-are-they/ . Online; last modified 02-April-2021
[65] Silver BD, Anderson BA, Abramson PR (1986) Who overreports voting? Am Polit Sci Rev 80(2):613–624
[66] Brito K, Adeodato PJL (2023) Machine learning for predicting elections in Latin America based on social media engagement and polls. Gov Inf Q 40(1):101782
[67] Savage S, Monroy-Hernández A (2015) Participatory militias: an analysis of an armed movement’s online audience. In: Proceedings of the 18th ACM conference on computer supported cooperative work & social computing, pp 724–733
[68] Gaurav M, Srivastava A, Kumar A, Miller S (2013) Leveraging candidate popularity on Twitter to predict election outcome
[69] Tumasjan A, Sprenger T, Sandner P, Welpe I (2010) Predicting elections with Twitter: what 140 characters reveal about political sentiment. In: Proceedings of the international AAAI conference on web and social media, vol 4, pp 178–185. https://doi.org/10.1609/icwsm.v4i1.14009
[70] Burnap P, Gibson R, Sloan L, Southern R, Williams M (2016) 140 characters to victory?: using Twitter to predict the UK 2015 general election. Elect Stud 41:230–233. https://doi.org/10.1016/j.electstud.2015.11.017
[71] Karpf D (2012) The MoveOn effect: the unexpected transformation of American political advocacy. Oxford University Press, London. https://doi.org/10.1093/acprof:oso/9780199898367.001.0001
[72] Bond RM, Fariss CJ, Jones JJ, Kramer AD, Marlow C, Settle JE, Fowler JH (2012) A 61-million-person experiment in social influence and political mobilization. Nature 489(7415):295–298
[73] O’Connor B, Balasubramanyan R, Routledge B, Smith N (2010) From tweets to polls: linking text sentiment to public opinion time series. AAAI Publications