Characterizing the ability of LLMs to recapitulate Americans' distributional responses to public opinion polling questions across political issues
Eric Gong, Nathan E. Sanders, Bruce Schneier
Traditional survey-based political issue polling is becoming less tractable due to increasing costs and risk of bias associated with growing non-response rates and declining coverage of key demographic groups. With researchers and pollsters seeking alternatives, Large Language Models have drawn attention for their potential to augment human population studies in polling contexts. We propose and implement a new framework for anticipating human responses on multiple-choice political issue polling questions by directly prompting an LLM to predict a distribution of responses. By comparison to a large and high quality issue poll of the US population, the Cooperative Election Study, we evaluate how the accuracy of this framework varies across a range of demographics and questions on a variety of topics, as well as how this framework compares to previously proposed frameworks where LLMs are repeatedly queried to simulate individual respondents. We find the proposed framework consistently exhibits more accurate predictions than individual querying at significantly lower cost. In addition, we find the performance of the proposed framework varies much more systematically and predictably across demographics and questions, making it possible for those performing AI polling to better anticipate model performance using only information available before a query is issued.
Policies frozen in silicon: using WPR to expose the politics of problem-solution configurations in technical artifacts
Jörgen Behrendtz, Lina Rahm
Design is often characterized as an act of problem-solving. This is a perspective that, while pervasive, risks reducing complex socio-technical conditions to easily fixable issues. This paper critiques the ideology of "design as problem-solving", highlighting its culmination in technological solutionism, where societal and human challenges are reframed as technical problems awaiting technical answers. Drawing on critiques and the recognition of "wicked problems", we argue that design must also be understood as a process of problem-framing, emphasizing the interpretive work involved in defining what counts as a problem and why. To advance this analytical perspective, we propose applying the What's the Problem Represented to be? (WPR) approach from critical policy studies to design and technology. By treating artifacts as materialized problem representations, WPR allows for the systematic unpacking of the ideological, cultural, and political assumptions encoded in technological forms. This analytical lens can reveal hidden problematisations within artifacts, foster reflexive design practice, and empirically challenge techno-solutionism. Ultimately, integrating WPR into design research enriches both design theory and philosophy of technology by offering a method to interrogate how technologies shape, and are shaped by, the questions they claim to answer.
An affinity based opinion dynamics model for the evolving pattern of political polarization
Zhang Xiaoming, Hu Yuzhong, Zhang Yiming
Political polarization has been a subject that has attracted many studies in recent years. We have developed an opinion dynamics model with affective homophily effect and national social norm effect to describe this phenomenon. The time evolution of the polarization between the two parties and the spread of opinions within each party are affected by three factors: the repulsive effect between the two parties, the attractive and repulsive effects between the members in each party, and the national social norm effect that pulls the opinions of all members towards a common norm. The model is internally consistent and is applied to the simulation of the symmetric patterns of polarization and spread of the opinion distributions in the U.S. Congress and the results align well with 154 years of recorded data. The time evolution of the strength of the national social norm effect is obtained and is consistent with the important historical events occurred during the past one and half century.
en
physics.soc-ph, stat.AP
We Politely Insist: Your LLM Must Learn the Persian Art of Taarof
Nikta Gohari Sadr, Sahar Heidariasl, Karine Megerdoomian
et al.
Large language models (LLMs) struggle to navigate culturally specific communication norms, limiting their effectiveness in global contexts. We focus on Persian taarof, a social norm in Iranian interactions, which is a sophisticated system of ritual politeness that emphasizes deference, modesty, and indirectness, yet remains absent from existing cultural benchmarks. We introduce TaarofBench, the first benchmark for evaluating LLM understanding of taarof, comprising 450 role-play scenarios covering 12 common social interaction topics, validated by native speakers. Our evaluation of five frontier LLMs reveals substantial gaps in cultural competence, with accuracy rates 40-48% below native speakers when taarof is culturally appropriate. Performance varies between interaction topics, improves with Persian-language prompts, and exhibits gender-based asymmetries. We also show that responses rated "polite" by standard metrics often violate taarof norms, indicating the limitations of Western politeness frameworks. Through supervised fine-tuning and Direct Preference Optimization, we achieve 21.8% and 42.3% improvement in model alignment with cultural expectations. Our human study with 33 participants (11 native Persian, 11 heritage, and 11 non-Iranian speakers) forms baselines in varying degrees of familiarity with Persian norms. This work lays the foundation for developing diverse and culturally aware LLMs, enabling applications that better navigate complex social interactions.
Computational Measurement of Political Positions: A Review of Text-Based Ideal Point Estimation Algorithms
Patrick Parschan, Charlott Jakob
This article presents the first systematic review of unsupervised and semi-supervised computational text-based ideal point estimation (CT-IPE) algorithms, methods designed to infer latent political positions from textual data. These algorithms are widely used in political science, communication, computational social science, and computer science to estimate ideological preferences from parliamentary speeches, party manifestos, and social media. Over the past two decades, their development has closely followed broader NLP trends -- beginning with word-frequency models and most recently turning to large language models (LLMs). While this trajectory has greatly expanded the methodological toolkit, it has also produced a fragmented field that lacks systematic comparison and clear guidance for applied use. To address this gap, we identified 25 CT-IPE algorithms through a systematic literature review and conducted a manual content analysis of their modeling assumptions and development contexts. To compare them meaningfully, we introduce a conceptual framework that distinguishes how algorithms generate, capture, and aggregate textual variance. On this basis, we identify four methodological families -- word-frequency, topic modeling, word embedding, and LLM-based approaches -- and critically assess their assumptions, interpretability, scalability, and limitations. Our review offers three contributions. First, it provides a structured synthesis of two decades of algorithm development, clarifying how diverse methods relate to one another. Second, it translates these insights into practical guidance for applied researchers, highlighting trade-offs in transparency, technical requirements, and validation strategies that shape algorithm choice. Third, it emphasizes that differences in estimation outcomes across algorithms are themselves informative, underscoring the need for systematic benchmarking.
Agent-Based Simulations of Online Political Discussions: A Case Study on Elections in Germany
Abdul Sittar, Simon Münker, Fabio Sartori
et al.
User engagement on social media platforms is influenced by historical context, time constraints, and reward-driven interactions. This study presents an agent-based simulation approach that models user interactions, considering past conversation history, motivation, and resource constraints. Utilizing German Twitter data on political discourse, we fine-tune AI models to generate posts and replies, incorporating sentiment analysis, irony detection, and offensiveness classification. The simulation employs a myopic best-response model to govern agent behavior, accounting for decision-making based on expected rewards. Our results highlight the impact of historical context on AI-generated responses and demonstrate how engagement evolves under varying constraints.
PoliTok-DE: A Multimodal Dataset of Political TikToks and Deletions From Germany
Tomas Ruiz, Andreas Nanz, Ursula Kristin Schmid
et al.
We present PoliTok-DE, a large-scale multimodal dataset (video, audio, images, text) of TikTok posts related to the 2024 Saxony state election in Germany. The corpus contains over 195,000 posts published between 01.07.2024 and 30.11.2024, of which over 18,000 (17.3%) were subsequently deleted from the platform. Posts were identified via the TikTok research API and complemented with web scraping to retrieve full multimodal media and metadata. PoliTok-DE supports computational social science across substantive and methodological agendas: substantive work on intolerance and political communication; methodological work on platform policies around deleted content and qualitative-quantitative multimodal research. To illustrate one possible analysis, we report a case study on the co-occurrence of intolerance and entertainment using an annotated subset. The dataset of post IDs is publicly available on Hugging Face, and full content can be hydrated with our provided code. Access to the deleted content is restricted, and can be requested for research purposes.
B-Call: Integrating Ideological Position and Political Cohesion in Legislative Voting Models
Juan Reutter, Sergio Toro, Lucas Valenzuela
et al.
This paper combines two significant areas of political science research: measuring individual ideological position and cohesion. Although both approaches help analyze legislative behaviors, no unified model currently integrates these dimensions. To fill this gap, the paper proposes a methodology called B-Call that combines ideological positioning with voting cohesion, treating votes as random variables. The model is empirically validated using roll-call data from the United States, Brazil, and Chile legislatures, which represent diverse legislative dynamics. The analysis aims to capture the complexities of voting and legislative behaviors, resulting in a two-dimensional indicator. This study addresses gaps in current legislative voting models, particularly in contexts with limited party control.
Fact-checking with Generative AI: A Systematic Cross-Topic Examination of LLMs Capacity to Detect Veracity of Political Information
Elizaveta Kuznetsova, Ilaria Vitulano, Mykola Makhortykh
et al.
The purpose of this study is to assess how large language models (LLMs) can be used for fact-checking and contribute to the broader debate on the use of automated means for veracity identification. To achieve this purpose, we use AI auditing methodology that systematically evaluates performance of five LLMs (ChatGPT 4, Llama 3 (70B), Llama 3.1 (405B), Claude 3.5 Sonnet, and Google Gemini) using prompts regarding a large set of statements fact-checked by professional journalists (16,513). Specifically, we use topic modeling and regression analysis to investigate which factors (e.g. topic of the prompt or the LLM type) affect evaluations of true, false, and mixed statements. Our findings reveal that while ChatGPT 4 and Google Gemini achieved higher accuracy than other models, overall performance across models remains modest. Notably, the results indicate that models are better at identifying false statements, especially on sensitive topics such as COVID-19, American political controversies, and social issues, suggesting possible guardrails that may enhance accuracy on these topics. The major implication of our findings is that there are significant challenges for using LLMs for factchecking, including significant variation in performance across different LLMs and unequal quality of outputs for specific topics which can be attributed to deficits of training data. Our research highlights the potential and limitations of LLMs in political fact-checking, suggesting potential avenues for further improvements in guardrails as well as fine-tuning.
Jornada digital MEI
Maria Eduarda Silva Sant'Ana, Kascilene Gonçalves Machado, Stela Cristina Hott Corrêa
et al.
Governador Valadares possui cerca de 25 mil microempreendedores individuais (MEI), somando 72% do total de empresas estabelecidas no município (Simples Nacional, 2022). Este modelo de organização empresarial é de significativa relevância, uma vez que ele é a base da geração da renda domiciliar de muitas famílias valadarenses. A Secretaria Municipal de Desenvolvimento, Ciência, Tecnologia e Inovação (SMDCTI) tem favorecido e promovido o empreendedorismo e a inovação entre as empresas do município, notadamente dos MEI, por meio de diversos projetos, muitos deles em parceria com a UFJF/GV. Nesse sentido, para oferecer maior conhecimento para os microempreendedores foi realizada uma capacitação gratuita que ocorreu de forma presencial, com duração de três semanas, abordando os principais assuntos sobre o marketing digital. Assim, o objetivo deste relato de experiência é detalhar o processo de capacitação, bem como destacar os desafios e os aprendizados enfrentados no desenvolvimento do projeto e durante a capacitação dos MEI no marketing digital. Ao longo do projeto, foi observada a importância do uso do marketing digital pelos microempreendedores individuais e como essa relação desempenhou um papel importante no desenvolvimento de suas atividades.
Social Sciences, Labor in politics. Political activity of the working class
How good is GPT at writing political speeches for the White House?
Jacques Savoy
Using large language models (LLMs), computers are able to generate a written text in response to a us er request. As this pervasive technology can be applied in numerous contexts, this study analyses the written style of one LLM called GPT by comparing its generated speeches with those of the recent US presidents. To achieve this objective, the State of the Union (SOTU) addresses written by Reagan to Biden are contrasted to those produced by both GPT-3.5 and GPT-4.o versions. Compared to US presidents, GPT tends to overuse the lemma "we" and produce shorter messages with, on average, longer sentences. Moreover, GPT opts for an optimistic tone, opting more often for political (e.g., president, Congress), symbolic (e.g., freedom), and abstract terms (e.g., freedom). Even when imposing an author's style to GPT, the resulting speech remains distinct from addresses written by the target author. Finally, the two GPT versions present distinct characteristics, but both appear overall dissimilar to true presidential messages.
A modified Hegselmann-Krause model for interacting voters and political parties
Patrick H. Cahill, Georg A. Gottwald
The Hegselmann--Krause model is a prototypical model for opinion dynamics. It models the stochastic time evolution of an agent's or voter's opinion in response to the opinion of other like-minded agents. The Hegselmann--Krause model only considers the opinions of voters; we extend it here by incorporating the dynamics of political parties which influence and are influenced by the voters. We show in numerical simulations for $1$- and $2$-dimensional opinion spaces that, as for the original Hegselmann--Krause model, the modified model exhibits opinion cluster formation as well as a phase transition from disagreement to consensus. We provide an analytical sufficient condition for the formation of unanimous consensus in which voters and parties collapse to the same point in opinion space in the deterministic case. Using mean-field theory, we further derive an approximation for the critical noise strength delineating consensus from non-consensus in the stochastically driven modified Hegselmann--Krause model. We compare our analytical findings with simulations of the modified Hegselmann--Krause model.
en
physics.soc-ph, nlin.AO
Selecting Between BERT and GPT for Text Classification in Political Science Research
Yu Wang, Wen Qu, Xin Ye
Political scientists often grapple with data scarcity in text classification. Recently, fine-tuned BERT models and their variants have gained traction as effective solutions to address this issue. In this study, we investigate the potential of GPT-based models combined with prompt engineering as a viable alternative. We conduct a series of experiments across various classification tasks, differing in the number of classes and complexity, to evaluate the effectiveness of BERT-based versus GPT-based models in low-data scenarios. Our findings indicate that while zero-shot and few-shot learning with GPT models provide reasonable performance and are well-suited for early-stage research exploration, they generally fall short - or, at best, match - the performance of BERT fine-tuning, particularly as the training set reaches a substantial size (e.g., 1,000 samples). We conclude by comparing these approaches in terms of performance, ease of use, and cost, providing practical guidance for researchers facing data limitations. Our results are particularly relevant for those engaged in quantitative text analysis in low-resource settings or with limited labeled data.
Large Language Models' Detection of Political Orientation in Newspapers
Alessio Buscemi, Daniele Proverbio
Democratic opinion-forming may be manipulated if newspapers' alignment to political or economical orientation is ambiguous. Various methods have been developed to better understand newspapers' positioning. Recently, the advent of Large Language Models (LLM), and particularly the pre-trained LLM chatbots like ChatGPT or Gemini, hold disruptive potential to assist researchers and citizens alike. However, little is know on whether LLM assessment is trustworthy: do single LLM agrees with experts' assessment, and do different LLMs answer consistently with one another? In this paper, we address specifically the second challenge. We compare how four widely employed LLMs rate the positioning of newspapers, and compare if their answers align with one another. We observe that this is not the case. Over a woldwide dataset, articles in newspapers are positioned strikingly differently by single LLMs, hinting to inconsistent training or excessive randomness in the algorithms. We thus raise a warning when deciding which tools to use, and we call for better training and algorithm development, to cover such significant gap in a highly sensitive matter for democracy and societies worldwide. We also call for community engagement in benchmark evaluation, through our open initiative navai.pro.
processo de curricularização da extensão no curso de Educação Física da Universidade do Estado de Minas Gerais (UEMG), Unidade Ibirité
Fernanda Abbatepietro Novaes, Diogo Rodrigues Puchta
A publicação da Resolução CNE/CES nº 7/2018, que institui as Diretrizes para a Extensão na Educação Superior Brasileira, provocou um incremento no debate sobre o papel da extensão universitária para o cumprimento da função social da Universidade, bem como a necessidade de reformulação dos currículos dos cursos de graduação. Este texto pretende relatar as reflexões e procedimentos adotados pelo curso de Educação Física da Universidade Estadual de Minas Gerais, unidade Ibirité, para o cumprimento da referida resolução. Partimos do entendimento de que a reivindicação de um espaço legítimo para as atividades extensionistas nos currículos implica em refutar o caráter acessório ou assistencialista, muitas vezes dado à extensão, e evidenciar seu potencial de transformação da sociedade e da própria universidade. Ao final, destacamos as perspectivas que vislumbramos para o curso e os desafios que estão colocados.
Social Sciences, Labor in politics. Political activity of the working class
Desenvolvimento de podcasts educativos para a educação de jovens e adultos
Fernanda Colombari de Salles Roselino, Luciani Ester Tenani, Fabio Fernandes Villela
A pandemia de COVID-19 evidenciou a necessidade de ampliação de práticas letradas digitais no país, sobretudo para a população idosa que ficou isolada por longo período. O presente relato tem como objetivo descrever as atividades desenvolvidas para mitigar os efeitos da ausência de contato presencial em aulas oferecidas à população idosa atendida pelo Programa de Educação de Jovens e Adultos (PEJA) promovido pela Universidade Estadual Paulista de São José do Rio Preto. As atividades foram desenvolvidas por meio de videoaulas e podcasts criados para veicular conteúdos sobre Matemática básica e Língua Portuguesa voltados à alfabetização digital. Foram criados nove episódios de videoaulas e dez episódios de podcasts e enviadas instruções aos alunos sobre como acessar essas mídias na internet. O desenvolvimento desse material foi articulado ao Objetivo de Desenvolvimento Sustentável 4 – Educação de Qualidade – da Organização das Nações Unidas. O desenvolvimento do material didático beneficiou diretamente a licencianda bolsista do projeto, pois ela adquiriu habilidades para sua futura prática docente, e, principalmente, a população atendida pelo PEJA-Rio Preto, além da comunidade local, que ampliou suas práticas letradas digitais e tiveram acesso gratuito a conteúdos da educação básica durante o período de isolamento social.
Social Sciences, Labor in politics. Political activity of the working class
Deep Maxout Network-based Feature Fusion and Political Tangent Search Optimizer enabled Transfer Learning for Thalassemia Detection
Hemn Barzan Abdalla, Awder Ahmed, Guoquan Li
et al.
Thalassemia is a heritable blood disorder which is the outcome of a genetic defect causing lack of production of hemoglobin polypeptide chains. However, there is less understanding of the precise frequency as well as sharing in these areas. Knowing about the frequency of thalassemia occurrence and dependable mutations is thus a significant step in preventing, controlling, and treatment planning. Here, Political Tangent Search Optimizer based Transfer Learning (PTSO_TL) is introduced for thalassemia detection. Initially, input data obtained from a particular dataset is normalized in the data normalization stage. Quantile normalization is utilized in the data normalization stage, and the data are then passed to the feature fusion phase, in which Weighted Euclidean Distance with Deep Maxout Network (DMN) is utilized. Thereafter, data augmentation is performed using the oversampling method to increase data dimensionality. Lastly, thalassemia detection is carried out by TL, wherein a convolutional neural network (CNN) is utilized with hyperparameters from a trained model such as Xception. TL is tuned by PTSO, and the training algorithm PTSO is presented by merging of Political Optimizer (PO) and Tangent Search Algorithm (TSA). Furthermore, PTSO_TL obtained maximal precision, recall, and f-measure values of about 94.3%, 96.1%, and 95.2%, respectively.
Optimizing text representations to capture (dis)similarity between political parties
Tanise Ceron, Nico Blokker, Sebastian Padó
Even though fine-tuned neural language models have been pivotal in enabling "deep" automatic text analysis, optimizing text representations for specific applications remains a crucial bottleneck. In this study, we look at this problem in the context of a task from computational social science, namely modeling pairwise similarities between political parties. Our research question is what level of structural information is necessary to create robust text representation, contrasting a strongly informed approach (which uses both claim span and claim category annotations) with approaches that forgo one or both types of annotation with document structure-based heuristics. Evaluating our models on the manifestos of German parties for the 2021 federal election. We find that heuristics that maximize within-party over between-party similarity along with a normalization step lead to reliable party similarity prediction, without the need for manual annotation.
Software-Supported Audits of Decision-Making Systems: Testing Google and Facebook's Political Advertising Policies
J. Nathan Matias, Austin Hounsel, Nick Feamster
How can society understand and hold accountable complex human and algorithmic decision-making systems whose systematic errors are opaque to the public? These systems routinely make decisions on individual rights and well-being, and on protecting society and the democratic process. Practical and statistical constraints on external audits--such as dimensional complexity--can lead researchers and regulators to miss important sources of error in these complex decision-making systems. In this paper, we design and implement a software-supported approach to audit studies that auto-generates audit materials and coordinates volunteer activity. We implemented this software in the case of political advertising policies enacted by Facebook and Google during the 2018 U.S. election. Guided by this software, a team of volunteers posted 477 auto-generated ads and analyzed the companies' actions, finding systematic errors in how companies enforced policies. We find that software can overcome some common constraints of audit studies, within limitations related to sample size and volunteer capacity.
Political Bias and Factualness in News Sharing across more than 100,000 Online Communities
Galen Weld, Maria Glenski, Tim Althoff
As civil discourse increasingly takes place online, misinformation and the polarization of news shared in online communities have become ever more relevant concerns with real world harms across our society. Studying online news sharing at scale is challenging due to the massive volume of content which is shared by millions of users across thousands of communities. Therefore, existing research has largely focused on specific communities or specific interventions, such as bans. However, understanding the prevalence and spread of misinformation and polarization more broadly, across thousands of online communities, is critical for the development of governance strategies, interventions, and community design. Here, we conduct the largest study of news sharing on reddit to date, analyzing more than 550 million links spanning 4 years. We use non-partisan news source ratings from Media Bias/Fact Check to annotate links to news sources with their political bias and factualness. We find that, compared to left-leaning communities, right-leaning communities have 105% more variance in the political bias of their news sources, and more links to relatively-more biased sources, on average. We observe that reddit users' voting and re-sharing behaviors generally decrease the visibility of extremely biased and low factual content, which receives 20% fewer upvotes and 30% fewer exposures from crossposts than more neutral or more factual content. This suggests that reddit is more resilient to low factual content than Twitter. We show that extremely biased and low factual content is very concentrated, with 99% of such content being shared in only 0.5% of communities, giving credence to the recent strategy of community-wide bans and quarantines.