LinguaGame: A Linguistically Grounded Game-Theoretic Paradigm for Multi-Agent Dialogue Generation
Yuxiao Ye, Yiming Zhang, Yiran Ma
et al.
Large Language Models (LLMs) have enabled Multi-Agent Systems (MASs) where agents interact through natural language to solve complex tasks or simulate multi-party dialogues. Recent work on LLM-based MASs has mainly focused on architecture design, such as role assignment and workflow orchestration. In contrast, this paper targets the interaction process itself, aiming to improve agents' communication efficiency by helping them convey their intended meaning more effectively through language. To this end, we propose LinguaGame, a linguistically-grounded game-theoretic paradigm for multi-agent dialogue generation. Our approach models dialogue as a signalling game over communicative intents and strategies, solved with a training-free equilibrium approximation algorithm for inference-time decision adjustment. Unlike prior game-theoretic MASs, whose game designs are often tightly coupled with task-specific objectives, our framework relies on linguistically informed reasoning with minimal task-specific coupling. Specifically, it treats dialogue as intentional and strategic communication, requiring agents to infer what others aim to achieve (intents) and how they pursue those goals (strategies). We evaluate our framework in simulated courtroom proceedings and debates, with human expert assessments showing significant gains in communication efficiency.
What if Deception Cannot be Detected? A Cross-Linguistic Study on the Limits of Deception Detection from Text
Aswathy Velutharambath, Kai Sassenberg, Roman Klinger
Can deception be detected solely from written text? Cues of deceptive communication are inherently subtle, even more so in text-only communication. Yet, prior studies have reported considerable success in automatic deception detection. We hypothesize that such findings are largely driven by artifacts introduced during data collection and do not generalize beyond specific datasets. We revisit this assumption by introducing a belief-based deception framework, which defines deception as a misalignment between an author's claims and true beliefs, irrespective of factual accuracy, allowing deception cues to be studied in isolation. Based on this framework, we construct three corpora, collectively referred to as DeFaBel, including a German-language corpus of deceptive and non-deceptive arguments and a multilingual version in German and English, each collected under varying conditions to account for belief change and enable cross-linguistic analysis. Using these corpora, we evaluate commonly reported linguistic cues of deception. Across all three DeFaBel variants, these cues show negligible, statistically insignificant correlations with deception labels, contrary to prior work that treats such cues as reliable indicators. We further benchmark against other English deception datasets following similar data collection protocols. While some show statistically significant correlations, effect sizes remain low and, critically, the set of predictive cues is inconsistent across datasets. We also evaluate deception detection using feature-based models, pretrained language models, and instruction-tuned large language models. While some models perform well on established deception datasets, they consistently perform near chance on DeFaBel. Our findings challenge the assumption that deception can be reliably inferred from linguistic cues and call for rethinking how deception is studied and modeled in NLP.
IndicEval-XL: Bridging Linguistic Diversity in Code Generation Across Indic Languages
Ujjwal Singh, Aditi Sharma, Nikhil Gupta
et al.
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation from natural language prompts, revolutionizing software development workflows. As we advance towards agent-based development paradigms, these models form the cornerstone of next-generation software development lifecycles. However, current benchmarks for evaluating multilingual code generation capabilities are predominantly English-centric, limiting their applicability across the global developer community. To address this limitation, we present IndicEval-XL, a comprehensive benchmark for code generation that incorporates 6 major Indic languages, collectively spoken by approximately 14\% of the world's population. Our benchmark bridges these languages with 12 programming languages, creating a robust evaluation framework. This work is particularly significant given India's representation of one-eighth of the global population and the crucial role Indic languages play in Indian society. IndicEval-XL represents a significant step toward expanding the linguistic diversity in code generation systems and evaluation frameworks. By developing resources that support multiple languages, we aim to make AI-powered development tools more inclusive and accessible to developers of various linguistic backgrounds. To facilitate further research and development in this direction, we make our dataset and evaluation benchmark publicly available at https://github.com/telekom/IndicEval-XL
How Persuasive Could LLMs Be? A First Study Combining Linguistic-Rhetorical Analysis and User Experiments
Daniel Raffini, Agnese Macori, Lorenzo Porcaro
et al.
This study examines the rhetorical and linguistic features of argumentative texts generated by ChatGPT on ethically nuanced topics and investigates their persuasive impact on human readers.Through a user study involving 62 participants and pre-post interaction surveys, the paper analyzes how exposure to AI-generated arguments affects opinion change and user perception. A linguistic and rhetorical analysis of the generated texts reveals a consistent argumentative macrostructure, reliance on formulaic expressions, and limited stylistic richness. While ChatGPT demonstrates proficiency in constructing coherent argumentative texts, its persuasive efficacy appears constrained, particularly on topics involving ethical issues.The study finds that while participants often acknowledge the benefits highlighted by ChatGPT, ethical concerns tend to persist or even intensify post-interaction. The results also demonstrate a variation depending on the topic. These findings highlight new insights on AI-generated persuasion in ethically sensitive domains and are a basis for future research.
Toward Objective and Interpretable Prosody Evaluation in Text-to-Speech: A Linguistically Motivated Approach
Cedric Chan, Jianjing Kuang
Prosody is essential for speech technology, shaping comprehension, naturalness, and expressiveness. However, current text-to-speech (TTS) systems still struggle to accurately capture human-like prosodic variation, in part because existing evaluation methods for prosody remain limited. Traditional metrics like Mean Opinion Score (MOS) are resource-intensive, inconsistent, and offer little insight into why a system sounds unnatural. This study introduces a linguistically informed, semi-automatic framework for evaluating TTS prosody through a two-tier architecture that mirrors human prosodic organization. The method uses quantitative linguistic criteria to evaluate synthesized speech against human speech corpora across multiple acoustic dimensions. By integrating discrete and continuous prosodic measures, it provides objective and interpretable metrics of both event placement and cue realization, while accounting for the natural variability observed across speakers and prosodic cues. Results show strong correlations with perceptual MOS ratings while revealing model-specific weaknesses that traditional perceptual tests alone cannot capture. This approach provides a principled path toward diagnosing, benchmarking, and ultimately improving the prosodic naturalness of next-generation TTS systems.
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Shailaja Keyur Sampat, Mutsumi Nakamura, Shankar Kailas
et al.
Deriving inference from heterogeneous inputs (such as images, text, and audio) is an important skill for humans to perform day-to-day tasks. A similar ability is desirable for the development of advanced Artificial Intelligence (AI) systems. While state-of-the-art models are rapidly closing the gap with human-level performance on diverse computer vision and NLP tasks separately, they struggle to solve tasks that require joint reasoning over visual and textual modalities. Inspired by GLUE (Wang et. al., 2018)- a multitask benchmark for natural language understanding, we propose VL-GLUE in this paper. VL-GLUE consists of over 100k samples spanned across seven different tasks, which at their core require visuo-linguistic reasoning. Moreover, our benchmark comprises of diverse image types (from synthetically rendered figures, and day-to-day scenes to charts and complex diagrams) and includes a broad variety of domain-specific text (from cooking, politics, and sports to high-school curricula), demonstrating the need for multi-modal understanding in the real-world. We show that this benchmark is quite challenging for existing large-scale vision-language models and encourage development of systems that possess robust visuo-linguistic reasoning capabilities.
A Multi-Task Role-Playing Agent Capable of Imitating Character Linguistic Styles
Siyuan Chen, Qingyi Si, Chenxu Yang
et al.
The advent of large language models (LLMs) has significantly propelled the advancement of Role-Playing Agents (RPAs). However, current Role-Playing Agents predominantly focus on mimicking a character's fundamental attributes while neglecting the replication of linguistic style, and they are incapable of effectively replicating characters when performing tasks beyond multi-turn dialogues, which results in generated responses that lack authenticity. The reason current RPAs lack this capability is due to the nature of existing character datasets, which lack collections of character quotations and are limited to multi-turn dialogue tasks, constraining the RPA's performance across other task domains and failing to mimic a character's linguistic style. To address this gap, we developed a multi-task role-playing dataset named MRstyle, which encompasses a substantial number of real individuals along with their quotations and covers seven different tasks. On this basis, we develop StyleRPA, a Multi-Task Role-Playing Agent (MRPA) that significantly outperforms recent open-source LLMs and RPAs baselines on 7 tasks including Dialogue, Dictionary, Composition, Story Generation, Product Description, Music Commentary, and Open Question Answering. The code and data will be released.
Listening to the Opus Dei Information Office in Navarra and the Basque Country: Good communication practices at the service of the governance of an institution
Inma Juan Pardo
AbstractCommunication is essential for organisations. The interaction of institutions with their publics and their reputation management depend on it. ‘An organisation that does not listen, or listens badly to its stakeholders and publics, will fail in its public communication,’ as Jim Macnamara says. This article deals with the listening work of the Opus Dei Information Office of the Basque Country and Navarra carried out in different phases: initiation, identification of interest groups, open listening, analysis of the findings, conclusions and transmission. It is offered as a good communication practice at the service of government that could be applied, with the necessary adjustments, to other institutions, especially Church communication offices. The text is divided into two parts: in the first, we review some concepts on institutional communication, governance, intangibles, reputation and listening, which serve as a framework; in the second, we summarise the case study, relying on interviews with people who were directly involved in the unfolding of events – Juan Carlos Mújika, the then director of the Office; Jesús Juan, who worked in the Prelature’s delegation; and Juan Manuel Mora, Vice Rector for Communication at the University of Navarra.
Philosophy of religion. Psychology of religion. Religion in relation to other subjects, Communication. Mass media
“I Had My Hair Cut Today to Share #Women_Short Cut_Campaign”: Feminist Selfies Protesting Misogyny
Sunah Lee
This study examines the #Women_Short Cut_Campaign movement, a feminist hashtag activism that began on Twitter (rebranded as X in 2023) in 2021. The movement was to defend a South Korean female archer and Olympic gold medalist, An San, from misogynistic attacks that accused her of being a man-hating feminist, given her short hairstyle. Informed by theories about social media’s affordances and affective politics, this article unpacks how women harness social media affordances to combat sexist oppression, particularly in the sociocultural context where women’s hair is fraught with gendered stereotypes and women’s bodies are historically deprived of agency under Neo-Confucian influence. The qualitative textual analysis of 1,849 tweets mostly written in Korean, with a focus on 811 selfies and images, suggests that #Women_Short Cut_Campaign functions as networked, affective counterpublics where oppressed women construct counter-narratives against the attempts to control women’s bodies. The hashtag also challenges the binary of online or offline and stretches the traditional notion of participation by urging digitally networked participants to take action offline. Participants practiced media solidarities by encouraging each other to protect themselves from potential sexual violence. In doing so, they realized affordances for practice through optimizing and contextualizing the original use of technologies. This research contributes to discussions on the sustainability of digital activism and the need for the pluralization and diversification of contemporary feminism. It also offers an opportunity to address the call for decolonial approaches in mobilizing Western-originated theories. Finally, it invites scholars to focus more on the visual in interrogating digital feminist activism.
Communication. Mass media
Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment
Royi Rassin, Eran Hirsch, Daniel Glickman
et al.
Text-conditioned image generation models often generate incorrect associations between entities and their visual attributes. This reflects an impaired mapping between linguistic binding of entities and modifiers in the prompt and visual binding of the corresponding elements in the generated image. As one notable example, a query like "a pink sunflower and a yellow flamingo" may incorrectly produce an image of a yellow sunflower and a pink flamingo. To remedy this issue, we propose SynGen, an approach which first syntactically analyses the prompt to identify entities and their modifiers, and then uses a novel loss function that encourages the cross-attention maps to agree with the linguistic binding reflected by the syntax. Specifically, we encourage large overlap between attention maps of entities and their modifiers, and small overlap with other entities and modifier words. The loss is optimized during inference, without retraining or fine-tuning the model. Human evaluation on three datasets, including one new and challenging set, demonstrate significant improvements of SynGen compared with current state of the art methods. This work highlights how making use of sentence structure during inference can efficiently and substantially improve the faithfulness of text-to-image generation.
Exploring the Design Space of Extra-Linguistic Expression for Robots
Amy Koike, Bilge Mutlu
In this paper, we explore the new design space of extra-linguistic cues inspired by graphical tropes used in graphic novels and animation to enhance the expressiveness of social robots. To achieve this, we identified a set of cues that can be used to generate expressions, including smoke/steam/fog, water droplets, and bubbles. We prototyped devices that can generate these fluid expressions for a robot and conducted design sessions where eight designers explored the use and utility of the cues in conveying the robot's internal states in various design scenarios. Our analysis of the 22 designs, the associated design justifications, and the interviews with designers revealed patterns in how each cue was used, how they were combined with nonverbal cues, and where the participants drew their inspiration from. These findings informed the design of an integrated module called EmoPack, which can be used to augment the expressive capabilities of any robot platform.
From Local to Global: Navigating Linguistic Diversity in the African Context
Rashmi Margani, Nelson Ndugu
The focus is on critical problems in NLP related to linguistic diversity and variation across the African continent, specifically with regards to African local dialects and Arabic dialects that have received little attention. We evaluated our various approaches, demonstrating their effectiveness while highlighting the potential impact of the proposed approach on businesses seeking to improve customer experience and product development in African local dialects. The idea of using the model as a teaching tool for product-based instruction is interesting, as it could potentially stimulate interest in learners and trigger techno entrepreneurship. Overall, our modified approach offers a promising analysis of the challenges of dealing with African local dialects. Particularly Arabic dialects, which could have a significant impact on businesses seeking to improve customer experience and product development.
Exploring Linguistic Properties of Monolingual BERTs with Typological Classification among Languages
Elena Sofia Ruzzetti, Federico Ranaldi, Felicia Logozzo
et al.
The impressive achievements of transformers force NLP researchers to delve into how these models represent the underlying structure of natural language. In this paper, we propose a novel standpoint to investigate the above issue: using typological similarities among languages to observe how their respective monolingual models encode structural information. We aim to layer-wise compare transformers for typologically similar languages to observe whether these similarities emerge for particular layers. For this investigation, we propose to use Centered Kernel Alignment to measure similarity among weight matrices. We found that syntactic typological similarity is consistent with the similarity between the weights in the middle layers, which are the pretrained BERT layers to which syntax encoding is generally attributed. Moreover, we observe that a domain adaptation on semantically equivalent texts enhances this similarity among weight matrices.
Teach me to play, gamer! Imitative learning in computer games via linguistic description of complex phenomena and decision tree
Clemente Rubio-Manzano, Tomas Lermanda, CLaudia Martinez
et al.
In this article, we present a new machine learning model by imitation based on the linguistic description of complex phenomena. The idea consists of, first, capturing the behaviour of human players by creating a computational perception network based on the execution traces of the games and, second, representing it using fuzzy logic (linguistic variables and if-then rules). From this knowledge, a set of data (dataset) is automatically created to generate a learning model based on decision trees. This model will be used later to automatically control the movements of a bot. The result is an artificial agent that mimics the human player. We have implemented, tested and evaluated this technology. The results obtained are interesting and promising, showing that this method can be a good alternative to design and implement the behaviour of intelligent agents in video game development.
Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition
Cheng Yi, Shiyu Zhou, Bo Xu
End-to-end models have achieved impressive results on the task of automatic speech recognition (ASR). For low-resource ASR tasks, however, labeled data can hardly satisfy the demand of end-to-end models. Self-supervised acoustic pre-training has already shown its amazing ASR performance, while the transcription is still inadequate for language modeling in end-to-end models. In this work, we fuse a pre-trained acoustic encoder (wav2vec2.0) and a pre-trained linguistic encoder (BERT) into an end-to-end ASR model. The fused model only needs to learn the transfer from speech to language during fine-tuning on limited labeled data. The length of the two modalities is matched by a monotonic attention mechanism without additional parameters. Besides, a fully connected layer is introduced for the hidden mapping between modalities. We further propose a scheduled fine-tuning strategy to preserve and utilize the text context modeling ability of the pre-trained linguistic encoder. Experiments show our effective utilizing of pre-trained modules. Our model achieves better recognition performance on CALLHOME corpus (15 hours) than other end-to-end models.
Cross-linguistic differences in gender congruency effects: Evidence from meta-analyses
Audrey Bürki, Emiel van den Hoven, Niels O. Schiller
et al.
It has been proposed that the order in which words are prepared for production depends on the speaker's language. When producing the translation equivalent of the small cat, speakers of German or Dutch select the gender-marked determiner at a relatively early stage of production. Speakers of French or Italian postpone the encoding of a determiner or adjective until the phonological form of the noun is available. Hence, even though the words are produced in the same order (e.g., die kleine Katze in German, le petit chat in French), they are not planned in the same order and might require different amounts of advanced planning prior to production onset. This distinction between early and late selection languages was proposed to account for the observation that speakers of Germanic and Slavic languages, but not of Romance languages, are slower to name pictures in the context of a distractor word of a different gender. Meta-analyses are conducted to provide the first direct test of this cross-linguistic difference and to test a prediction of the late selection hypothesis. They confirm the existence of the gender congruency effect in German/Slavic languages and its absence in Romance languages when target and distractor words are presented simultaneously. They do not allow confirming the hypothesis that in the latter languages, a similar effect emerges when the presentation of the distractor is delayed. Overall, these analyses confirm the cross-linguistic difference but show that the evidence available to date is not sufficient to confirm or reject the late selection hypothesis as an explanation of this difference. We highlight specific directions for future research.
<i>Data-Driven Techniques in Speech Synthesis</i> R. I. Damper (editor) (University of Southampton) Boston: Kluwer Academic Publishers, 2001, xviii+316 pp; hardbound, ISBN 0-412-81750-0, $145.00, € 148.00, £100.00
Thierry Dutoit
Computational linguistics. Natural language processing
Investigating Negative Wh-Constructions in Persian
Hengameh Vaezi, Akram Razavizadeh
INTRODUCTIONThe present study examines a specific type of constructions that are not intended to get information or receive an answer and the speaker confirms his/her denial or the impossibility of the case; these types of constructions are called negative Wh-constructions. The purpose of the present study is to get the features of these constructions in Persian. Research scope is a number of negative constructions that Persian speakers agree on their correctness. Data are analyzed from two semantic – pragmatic dimensions based on Cheung (2008 – 2009). 1-3 are English negative Wh-examples and 4-5 are Persian ones:Where did he eat anything in the library?! (Kiss, 2015, p. 4) Since when/ *from when/ *when is John watching TV now?! (Cheung, 2009, p.298) Since when/ *from when/ *when is John a professor?! (Cheung, 2008, p.48)Koja Mina ketab mi khune?! Where Mina book PRES- read Az key ta hala Maryam qazaye mahali dorost kardan balade?! From when (since) Mary food local cook to be able to Reviewing the research literature shows that so far this type of questions in Persian have been largely ignored linguistically and only rhetorical scholars in poetry and fiction have dealt with it. While the use of them is not limited to the field of literature and poetry, and are also used in a variety of Persian colloquial and discourse contexts. Therefore, in this paper, this type of constructions is studied based on the principles governing linguistics. We examine which wh-words are used in these Persian sentences. What are the special semantic - pragmatic features, and what are their differences or similarities with conventional interrogatives and other similar constructions.Our study has 3 parts: After reviewing the previous studies, presenting the framework, the features of this type of constructions are discussed semantically - pragmatically. We use different tests to determine their characteristics and distinguish them from other constructions such as conventional, emphatic, surprising and rhetorical ones. The final section deals with the results of Persian data and evidence. MATERIALS AND METHODSThe scope of the study consists of a number of negative wh-questions that Persian speakers agree on their correctness. The data have been gathered from speakers’ everyday conversations in natural contexts. They are analyzed from semantic - pragmatic dimensions. The method of research is descriptive – analytic. RESULTS AND DISCUSSIONThe overall results of data indicate that despite the apparent similarity between wh-questions in Persian, the negative wh-questions are different from conventional, surprising, emphatic and rhetorical ones. The results show that conventional wh-constructions can be combined with some adverbs, but the combination of negative wh-ones with the adverbs leads to ungrammatical constructions. The examination of the data also shows that in Persian, some wh-words like where, when and who are unmarked wh-words in negative wh-constructions. Negative wh-question words do not refer to place, time, etc. Unlike conventional interrogative constructs, negative wh-ones are largely fixed in form and cannot be changed or replaced by a seemingly synonymous wh-word. Morphologically, wh-words of negative wh-constructs are restricted to a very limited set of wh-words, and semantically they are used only in the contexts that indicate disagreement. Also, reviewing data shows that in conventional wh-constructions, depending on the type of wh-word, it can be answered with a piece of fragment. While in negative wh-constructions, it is not possible to answer as a fragment.The examination of Persian data related to negative wh-constructions and rhetorical ones shows that both of them are related to non-interrogative interpretation and in both, the speaker does not follow the answer. Despite this similarity, negative wh-questions in any context show the meaning of at all and negation, but rhetoric shows both positive and negative states. Generally, the results show that negative wh-constructions are different from the other constructions mentioned above. CONCLUSIONSemantic - pragmatic study of these constructions show that the presence of a positive verb, lack of getting answers and limited use of wh-words are special features of these sentences that distinguish them from other similar ones. Syntactic tests including substitution, adjunct doubling, embedding, and negation dominance shows that, a) limited number of wh-words are used in these constructions. Therefore, substituting the synonymous wh-word makes these constructions ungrammatical. b) Adjunct doubling is acceptable and permissible. c) They aren't used in dependent clause positions. d) The dominance of negation in these constructions is one-sided and only the negation form dominates the whole sentence. The evaluation of syntactic features also shows the distinction between these constructions and the conventional ones.
The Dictionary Of Pacific Alliance Illustrious Personalities: Linguacultural Lexicography And Teaching Spanish
O. Chesnokova
The present article summarizes the results of the research on the illustrious personalities of the four founding member states of the Pacific Alliance; their extra- and intralinguistic parameters are determined from the point of view of their statics and dynamics. The authors analyze and systemize the illustrious personalities of four Latin American Spanish-speaking countries and suggest their own definition of the term “illustrious personality” as an ambivalent denomination of the person and the existing body of knowledge related to the person in the collective memory of the native speakers. In this case, it is the collective memory of the speakers of the corresponding national variants of Spanish, taking into consideration their socio-historical and culturological significance for the corresponding linguacultures, which is reflected in The Dictionary of Pacific Alliance Illustrious Personalities. The authors selected 60 personalities for each country based on their linguacultural, associative and commemorative relevance. The historical, linguistic and semiotic approaches used in the research reveal the characteristics of the national identity of the four countries of the Pacific Alliance. Different types of the onomastic identification of the protagonists are established, as well as allusive and intertextual parameters of the corresponding cultures. Thus, systemizing the data about the iconic illustrious personalities of the Pacific Alliance represents a transdisciplinary task of the contemporary onomastics, Romance philology, general and applied linguistics, the theory and methodics of teaching foreign language, but it is also an important step towards creating a typology that would reflect different types of onomastic dominants within a society.
Public Apology: Yesterday, Today, Tomorrow
Koshkarova , Natalya Nikolayevna
The article describes the genre of public apology in the diachronic and synchronic aspects. The data from different types of discourse serve as the material for the analysis: belleslettres, nonconformist, judicial, political, mass-media. The method of discourse analysis was the leading one while analyzing the functioning of public apology in various types of interaction. The method of contextual study of communicative situation was also applied. The article defines the communicative aim of the public apology, draws a boundary line between an apology and penitence, and identifies the position of public apology among other genres of judicial discourse. Within the framework of political discourse the author singles out cognitive and discourse peculiarities of public apology and describes the forms of public apology on the national and international levels. The genre of public apology is viewed as representation of values in the discourse of new sensibility. The videos in the format of public apology are analyzed as a separate genre mode.There is a conclusion that the genre approach to the study of the communicative practices in the diachronic and synchronic aspects allows to describe not only the current situation with the genre nomenclature, but also to forecast the development of genre forms.