CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR
Muhammad Shakeel, Yosuke Fukumoto, Chikara Maeda
et al.
We present CALM, a joint Contextual Acoustic-Linguistic Modeling framework for multi-speaker automatic speech recognition (ASR). In personalized AI scenarios, the joint availability of acoustic and linguistic cues naturally motivates the integration of target-speaker conditioning with contextual biasing in overlapping conversations. CALM implements this integration in an end-to-end framework through speaker embedding-driven target-speaker extraction and dynamic vocabulary-based contextual biasing. We evaluate CALM on simulated English (LibriSpeechMix) and Japanese (Corpus of Spontaneous Japanese mixtures, CSJMix). On two-speaker mixtures, CALM reduces biased word error rate (B-WER) from 12.7 to 4.7 on LibriSpeech2Mix and biased character error rate (B-CER) from 16.6 to 8.4 on CSJMix2 (eval3), demonstrating the effectiveness of joint acoustic-linguistic modeling across languages. We additionally report results on the AMI corpus (IHM-mix condition) to validate performance on standardized speech mixtures.
Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks
Atsuki Yamaguchi, Maggie Mi, Nikolaos Aletras
Language models (LMs) are pre-trained on raw text datasets to generate text sequences token-by-token. While this approach facilitates the learning of world knowledge and reasoning, it does not explicitly optimize for linguistic competence. To bridge this gap, we propose L2T, a pre-training framework integrating Language Learning Tasks alongside standard next-token prediction. Inspired by human language acquisition, L2T transforms raw text into structured input-output pairs to provide explicit linguistic stimulation. Pre-training LMs on a mixture of raw text and L2T data not only improves overall performance on linguistic competence benchmarks but accelerates its acquisition, while maintaining competitive performance on general reasoning tasks.
Limited Linguistic Diversity in Embodied AI Datasets
Selma Wanna, Agnes Luhtaru, Jonathan Salfity
et al.
Language plays a critical role in Vision-Language-Action (VLA) models, yet the linguistic characteristics of the datasets used to train and evaluate these systems remain poorly documented. In this work, we present a systematic dataset audit of several widely used VLA corpora, aiming to characterize what kinds of instructions these datasets actually contain and how much linguistic variety they provide. We quantify instruction language along complementary dimensions-including lexical variety, duplication and overlap, semantic similarity, and syntactic complexity. Our analysis shows that many datasets rely on highly repetitive, template-like commands with limited structural variation, yielding a narrow distribution of instruction forms. We position these findings as descriptive documentation of the language signal available in current VLA training and evaluation data, intended to support more detailed dataset reporting, more principled dataset selection, and targeted curation or augmentation strategies that broaden language coverage.
CLEAR: A Comprehensive Linguistic Evaluation of Argument Rewriting by Large Language Models
Thomas Huber, Christina Niklaus
While LLMs have been extensively studied on general text generation tasks, there is less research on text rewriting, a task related to general text generation, and particularly on the behavior of models on this task. In this paper we analyze what changes LLMs make in a text rewriting setting. We focus specifically on argumentative texts and their improvement, a task named Argument Improvement (ArgImp). We present CLEAR: an evaluation pipeline consisting of 57 metrics mapped to four linguistic levels: lexical, syntactic, semantic and pragmatic. This pipeline is used to examine the qualities of LLM-rewritten arguments on a broad set of argumentation corpora and compare the behavior of different LLMs on this task and analyze the behavior of different LLMs on this task in terms of linguistic levels. By taking all four linguistic levels into consideration, we find that the models perform ArgImp by shortening the texts while simultaneously increasing average word length and merging sentences. Overall we note an increase in the persuasion and coherence dimensions.
ПОШУКИ ІДЕНТИЧНОСТІ У ПРОСТОРАХ ПАМ’ЯТІ (НА ОСНОВІ РОМАНІВ МАРІ-ФРАНС КЛЕР ТА ВІКТОРІЇ БЕЛІМ)
Елла Мінцис, Наталія Яцків
Статтю присвячено аналізу проблеми ідентичності у творах письменниць з українським корінням, зокрема у романі Марі-Франс Клер «П’ять майорців для мого незнайомця» та Вікторії Белім «The Rooster House: A Ukrainian Family Memoir». Методологією дослідження слугує феноменологія пам’яті, запропонована Полем Рікером та розроблена Аляйдою Ассман. Крізь призму автобіографічних романів письменниці розгортають власні спогади про родину, таким чином через індивідуальну пам’ять прагнуть осмислити історичну дійсність України, пережити травматичний досвід родинних таємниць, щоб увиразнити свою ідентичність, пройти складний шлях самоідентифікації для віднайдення себе. У геополітичному просторі мультикультуралізму особистість шукає тих символів, маркерів, концептів, які допомагають не втратити зв’язок з предками, не розгубити цінності, не озлобитися й не розчинитися серед різних культурологічних, суспільних чи інших реалій. Вибір творів Марі-Франс Клер та Вікторії Белім зумовлений тим, що обидві авторки переживають схожий травматичний досвід, обидві вибудовують свої твори у формі спогадів, звертаються до драматичних історичних моментів України й прагнуть неупереджено, але при цьому не менш емоційно, простежити складний процес самоідентифікації українок та знайти примирення з собою, прийняти своє українське коріння. Їх досвід, зафіксований у формі художніх творів, стає частиною культурної пам’яті українського народу та допомагає розкрити специфіку української ідентичності різними мовами, а отже – донести до представників іншого мультикультурного середовища сутність української трагедії, допомогти зрозуміти причини необхідності остаточної перемоги у боротьбі за незалежність.
LMLPA: Language Model Linguistic Personality Assessment
Jingyao Zheng, Xian Wang, Simo Hosio
et al.
Large Language Models (LLMs) are increasingly used in everyday life and research. One of the most common use cases is conversational interactions, enabled by the language generation capabilities of LLMs. Just as between two humans, a conversation between an LLM-powered entity and a human depends on the personality of the conversants. However, measuring the personality of a given LLM is currently a challenge. This paper introduces the Language Model Linguistic Personality Assessment (LMLPA), a system designed to evaluate the linguistic personalities of LLMs. Our system helps to understand LLMs' language generation capabilities by quantitatively assessing the distinct personality traits reflected in their linguistic outputs. Unlike traditional human-centric psychometrics, the LMLPA adapts a personality assessment questionnaire, specifically the Big Five Inventory, to align with the operational capabilities of LLMs, and also incorporates the findings from previous language-based personality measurement literature. To mitigate sensitivity to the order of options, our questionnaire is designed to be open-ended, resulting in textual answers. Thus, the AI rater is needed to transform ambiguous personality information from text responses into clear numerical indicators of personality traits. Utilising Principal Component Analysis and reliability validations, our findings demonstrate that LLMs possess distinct personality traits that can be effectively quantified by the LMLPA. This research contributes to Human-Computer Interaction and Human-Centered AI, providing a robust framework for future studies to refine AI personality assessments and expand their applications in multiple areas, including education and manufacturing.
Finetuning Language Models to Emit Linguistic Expressions of Uncertainty
Arslan Chaudhry, Sridhar Thiagarajan, Dilan Gorur
Large language models (LLMs) are increasingly employed in information-seeking and decision-making tasks. Despite their broad utility, LLMs tend to generate information that conflicts with real-world facts, and their persuasive style can make these inaccuracies appear confident and convincing. As a result, end-users struggle to consistently align the confidence expressed by LLMs with the accuracy of their predictions, often leading to either blind trust in all outputs or a complete disregard for their reliability. In this work, we explore supervised finetuning on uncertainty-augmented predictions as a method to develop models that produce linguistic expressions of uncertainty. Specifically, we measure the calibration of pre-trained models and then fine-tune language models to generate calibrated linguistic expressions of uncertainty. Through experiments on various question-answering datasets, we demonstrate that LLMs are well-calibrated in assessing their predictions, and supervised finetuning based on the model's own confidence leads to well-calibrated expressions of uncertainty, particularly for single-claim answers.
Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination
Eve Fleisig, Genevieve Smith, Madeline Bossi
et al.
We present a large-scale study of linguistic bias exhibited by ChatGPT covering ten dialects of English (Standard American English, Standard British English, and eight widely spoken non-"standard" varieties from around the world). We prompted GPT-3.5 Turbo and GPT-4 with text by native speakers of each variety and analyzed the responses via detailed linguistic feature annotation and native speaker evaluation. We find that the models default to "standard" varieties of English; based on evaluation by native speakers, we also find that model responses to non-"standard" varieties consistently exhibit a range of issues: stereotyping (19% worse than for "standard" varieties), demeaning content (25% worse), lack of comprehension (9% worse), and condescending responses (15% worse). We also find that if these models are asked to imitate the writing style of prompts in non-"standard" varieties, they produce text that exhibits lower comprehension of the input and is especially prone to stereotyping. GPT-4 improves on GPT-3.5 in terms of comprehension, warmth, and friendliness, but also exhibits a marked increase in stereotyping (+18%). The results indicate that GPT-3.5 Turbo and GPT-4 can perpetuate linguistic discrimination toward speakers of non-"standard" varieties.
Decomposed Prompting: Probing Multilingual Linguistic Structure Knowledge in Large Language Models
Ercong Nie, Shuzhou Yuan, Bolei Ma
et al.
Probing the multilingual knowledge of linguistic structure in LLMs, often characterized as sequence labeling, faces challenges with maintaining output templates in current text-to-text prompting strategies. To solve this, we introduce a decomposed prompting approach for sequence labeling tasks. Diverging from the single text-to-text prompt, our prompt method generates for each token of the input sentence an individual prompt which asks for its linguistic label. We test our method on the Universal Dependencies part-of-speech tagging dataset for 38 languages, using both English-centric and multilingual LLMs. Our findings show that decomposed prompting surpasses the iterative prompting baseline in efficacy and efficiency under zero- and few-shot settings. Moreover, our analysis of multilingual performance of English-centric LLMs yields insights into the transferability of linguistic knowledge via multilingual prompting.
Personalized Text Generation with Fine-Grained Linguistic Control
Bashar Alhafni, Vivek Kulkarni, Dhruv Kumar
et al.
As the text generation capabilities of large language models become increasingly prominent, recent studies have focused on controlling particular aspects of the generated text to make it more personalized. However, most research on controllable text generation focuses on controlling the content or modeling specific high-level/coarse-grained attributes that reflect authors' writing styles, such as formality, domain, or sentiment. In this paper, we focus on controlling fine-grained attributes spanning multiple linguistic dimensions, such as lexical and syntactic attributes. We introduce a novel benchmark to train generative models and evaluate their ability to generate personalized text based on multiple fine-grained linguistic attributes. We systematically investigate the performance of various large language models on our benchmark and draw insights from the factors that impact their performance. We make our code, data, and pretrained models publicly available.
Exploring Linguistic Probes for Morphological Generalization
Jordan Kodner, Salam Khalifa, Sarah Payne
Modern work on the cross-linguistic computational modeling of morphological inflection has typically employed language-independent data splitting algorithms. In this paper, we supplement that approach with language-specific probes designed to test aspects of morphological generalization. Testing these probes on three morphologically distinct languages, English, Spanish, and Swahili, we find evidence that three leading morphological inflection systems employ distinct generalization strategies over conjugational classes and feature sets on both orthographic and phonologically transcribed inputs.
Injecting linguistic knowledge into BERT for Dialogue State Tracking
Xiaohan Feng, Xixin Wu, Helen Meng
Dialogue State Tracking (DST) models often employ intricate neural network architectures, necessitating substantial training data, and their inference process lacks transparency. This paper proposes a method that extracts linguistic knowledge via an unsupervised framework and subsequently utilizes this knowledge to augment BERT's performance and interpretability in DST tasks. The knowledge extraction procedure is computationally economical and does not require annotations or additional training data. The injection of the extracted knowledge can be achieved by the addition of simple neural modules. We employ the Convex Polytopic Model (CPM) as a feature extraction tool for DST tasks and illustrate that the acquired features correlate with syntactic and semantic patterns in the dialogues. This correlation facilitates a comprehensive understanding of the linguistic features influencing the DST model's decision-making process. We benchmark this framework on various DST tasks and observe a notable improvement in accuracy.
Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting Pre-trained Language Models
Raymond Li, Gabriel Murray, Giuseppe Carenini
In this work, we propose a method that combines two popular research areas by injecting linguistic structures into pre-trained language models in the parameter-efficient fine-tuning (PEFT) setting. In our approach, parallel adapter modules encoding different linguistic structures are combined using a novel Mixture-of-Linguistic-Experts architecture, where Gumbel-Softmax gates are used to determine the importance of these modules at each layer of the model. To reduce the number of parameters, we first train the model for a fixed small number of steps before pruning the experts based on their importance scores. Our experiment results with three different pre-trained models show that our approach can outperform state-of-the-art PEFT methods with a comparable number of parameters. In addition, we provide additional analysis to examine the experts selected by each model at each layer to provide insights for future studies.
Taiwan LLM: Bridging the Linguistic Divide with a Culturally Aligned Language Model
Yen-Ting Lin, Yun-Nung Chen
In the realm of language models, the nuanced linguistic and cultural intricacies of Traditional Chinese, as spoken in Taiwan, have been largely overlooked. This paper introduces Taiwan LLM, a pioneering Large Language Model that specifically caters to the Traditional Chinese language, with a focus on the variant used in Taiwan. Leveraging a comprehensive pretraining corpus and instruction-finetuning datasets, we have developed a model that not only understands the complexities of Traditional Chinese but also embodies the cultural context of Taiwan. Taiwan LLM represents the first of its kind, a model that is not only linguistically accurate but also culturally resonant with its user base. Our evaluations demonstrate that Taiwan LLM achieves superior performance in understanding and generating Traditional Chinese text, outperforming existing models that are predominantly trained on Simplified Chinese or English. The open-source release of Taiwan LLM invites collaboration and further innovation, ensuring that the linguistic diversity of Chinese speakers is embraced and well-served. The model, datasets, and further resources are made publicly available to foster ongoing research and development in this field.
Mari D’Agostino, NOI CHE SIAMO PASSATI DALLA LIBIA. GIOVANI IN VIAGGIO FRA ALFABETI E MULTILINGUISMO
Maria Rosa Turrisi
Language and Literature, Philology. Linguistics
Kirsten Fermaglich: A Rosenberg by Any Other Name. A History of Jewish Name Changing in America [Rosenberg, bárhogy is nevezzük. Az amerikai zsidó névváltoztatások története]
Tamás Farkas
-
BASIC VARIETY E INTERLINGUA IN ITALIANO L2. NOTE SULLA SCRITTURA DI ARABOFONI
Yahis Martari
La varietà di base è un sistema «semplice, versatile e molto efficace per la maggior parte degli scopi comunicativi» (Klein e Perdue, 1997: 304). L’obiettivo principale di questo articolo è scoprire se i fenomeni di interferenza dalla L1 in L2 di parlanti arabi sono accettabili nella varietà di base italiana L2 o se dovrebbero essere evitati perché ostacolano la funzionalità comunicativa di BV di italiano L2. Partendo da una premessa sintetica su alcune caratteristiche della lingua araba e su alcune questioni educative riguardanti l’apprendimento dell’italiano da parte degli arabofoni, facciamo alcune osservazioni sulle caratteristiche della BV e infine ci concentriamo sull’analisi di alcuni testi prodotti in italiano L2 di parlanti arabi che rappresentano un sotto corpus di VALICO (Corino e Marello, 2017).
Basic variety and interlanguage in Italian L2. Notes on Arab speakers Italian L2 writing
The Basic Variety is a “simple, versatile, and highly efficient for most communicative purposes” system (Klein and Perdue, 1997: 304). The main goal of this article is to find out if L1 transfer phenomena in Italian L2 Arabic speakers are acceptable in Basic Variety (BV) Italian L2 or if they should be avoided because they hinder the communicative functionality of BV Italian L2. Starting from a synthetic premise on some characteristics of the Arabic language and some educational issues regarding the learning of Italian by Arabic speakers (Della Puppa, 2007), we make observations on BV and finally we focus on the analysis of some texts that represent a sub corpus of the VALICO corpus (Corino and Marello, 2017) produced in Italian L2 by Arabic speakers.
Language and Literature, Philology. Linguistics
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Xu Yang, Hanwang Zhang, Chongyang Gao
et al.
Humans tend to decompose a sentence into different parts like \textsc{sth do sth at someplace} and then fill each part with certain content. Inspired by this, we follow the \textit{principle of modular design} to propose a novel image captioner: learning to Collocate Visual-Linguistic Neural Modules (CVLNM). Unlike the \re{widely used} neural module networks in VQA, where the language (\ie, question) is fully observable, \re{the task of collocating visual-linguistic modules is more challenging.} This is because the language is only partially observable, for which we need to dynamically collocate the modules during the process of image captioning. To sum up, we make the following technical contributions to design and train our CVLNM: 1) \textit{distinguishable module design} -- \re{four modules in the encoder} including one linguistic module for function words and three visual modules for different content words (\ie, noun, adjective, and verb) and another linguistic one in the decoder for commonsense reasoning, 2) a self-attention based \textit{module controller} for robustifying the visual reasoning, 3) a part-of-speech based \textit{syntax loss} imposed on the module controller for further regularizing the training of our CVLNM. Extensive experiments on the MS-COCO dataset show that our CVLNM is more effective, \eg, achieving a new state-of-the-art 129.5 CIDEr-D, and more robust, \eg, being less likely to overfit to dataset bias and suffering less when fewer training samples are available. Codes are available at \url{https://github.com/GCYZSL/CVLMN}
Did Dog Domestication Contribute to Language Evolution?
Antonio Benítez-Burraco, Daniela Pörtl, Christoph Jung
Different factors seemingly account for the emergence of present-day languages in our species. Human self-domestication has been recently invoked as one important force favoring language complexity mostly via a cultural mechanism. Because our self-domestication ultimately resulted from selection for less aggressive behavior and increased prosocial behavior, any evolutionary or cultural change impacting on aggression levels is expected to have fostered this process. Here, we hypothesize about a parallel domestication of humans and dogs, and more specifically, about a positive effect of our interaction with dogs on human self-domestication, and ultimately, on aspects of language evolution, through the mechanisms involved in the control of aggression. We review evidence of diverse sort (ethological mostly, but also archeological, genetic, and physiological) supporting such an effect and propose some ways of testing our hypothesis.