Word Error Rate (WER) mischaracterizes ASR models' performance for African languages by combining phonological, tone, and other linguistic errors into a single lexical error. By contrast, Feature Error Rate (FER) has recently attracted attention as a viable metric that reveals linguistically meaningful errors in models' performance. In this paper, we evaluate three speech encoders on two African languages by complementing WER with CER, and FER, and add a tone-aware extension (TER). We show that by computing errors on phonological features, FER and TER reveal linguistically-salient error patterns even when word-level accuracy remains low. Our results reveal that models perform better on segmental features, while tones (especially mid and downstep) remain the most challenging features. Results on Yoruba show a striking differential in metrics, with WER=0.788, CER=0.305, and FER=0.151. Similarly for Uneme (an endangered language absent from pretraining data) a model with near-total WER and 0.461 CER achieves the relatively low FER of 0.267. This indicates model error is often attributable to individual phonetic feature errors, which is obscured by all-or-nothing metrics like WER.
Large language models exhibit sycophancy: the tendency to shift outputs toward user-expressed stances, regardless of correctness or consistency. While prior work has studied this issue and its impacts, rigorous computational linguistic metrics are needed to identify when models are being sycophantic. Here, we introduce SWAY, an unsupervised computational linguistic measure of sycophancy. We develop a counterfactual prompting mechanism to identify how much a model's agreement shifts under positive versus negative linguistic pressure, isolating framing effects from content. Applying this metric to benchmark 6 models, we find that sycophancy increases with epistemic commitment. Leveraging our metric, we introduce a counterfactual mitigation strategy teaching models to consider what the answer would be if opposite assumptions were suggested. While baseline mitigation instructing to be explicitly anti-sycophantic yields moderate reductions, and can backfire, our counterfactual CoT mitigation drives sycophancy to near zero across models, commitment levels, and clause types, while not suppressing responsiveness to genuine evidence. Overall, we contribute a metric for benchmarking sycophancy and a mitigation informed by it.
Extracting medical decisions from clinical notes is a key step for clinical decision support and patient-facing care summaries. We study how the linguistic characteristics of clinical decisions vary across decision categories and whether these differences explain extraction failures. Using MedDec discharge summaries annotated with decision categories from the Decision Identification and Classification Taxonomy for Use in Medicine (DICTUM), we compute seven linguistic indices for each decision span and analyze span-level extraction recall of a standard transformer model. We find clear category-specific signatures: drug-related and problem-defining decisions are entity-dense and telegraphic, whereas advice and precaution decisions contain more narrative, with higher stopword and pronoun proportions and more frequent hedging and negation cues. On the validation split, exact-match recall is 48%, with large gaps across linguistic strata: recall drops from 58% to 24% from the lowest to highest stopword-proportion bins, and spans containing hedging or negation cues are less likely to be recovered. Under a relaxed overlap-based match criterion, recall increases to 71%, indicating that many errors are span boundary disagreements rather than complete misses. Overall, narrative-style spans--common in advice and precaution decisions--are a consistent blind spot under exact matching, suggesting that downstream systems should incorporate boundary-tolerant evaluation and extraction strategies for clinical decisions.
Marco Baroni, Emily Cheng, Iria de-Dios-Flores
et al.
We explore the intrinsic dimension (ID) of LLM representations as a marker of linguistic complexity, asking if different ID profiles across LLM layers differentially characterize formal and functional complexity. We find the formal contrast between sentences with multiple coordinated or subordinated clauses to be reflected in ID differences whose onset aligns with a phase of more abstract linguistic processing independently identified in earlier work. The functional contrasts between sentences characterized by right branching vs. center embedding or unambiguous vs. ambiguous relative clause attachment are also picked up by ID, but in a less marked way, and they do not correlate with the same processing phase. Further experiments using representational similarity and layer ablation confirm the same trends. We conclude that ID is a useful marker of linguistic complexity in LLMs, that it allows to differentiate between different types of complexity, and that it points to similar stages of linguistic processing across disparate LLMs.
Josh McGiff, Khanh-Tung Tran, William Mulcahy
et al.
We present Irish-BLiMP (Irish Benchmark of Linguistic Minimal Pairs), the first dataset and framework designed for fine-grained evaluation of linguistic competence in the Irish language, an endangered language. Drawing on a variety of linguistic literature and grammar reference works, we manually constructed and reviewed 1020 minimal pairs across a taxonomy of 11 linguistic features, through a team of fluent Irish speakers. We evaluate both existing Large Language Models (LLMs) and fluent human participants on their syntactic knowledge of Irish. Our findings show that humans outperform all models across all linguistic features, achieving 16.6% higher accuracy on average. Moreover, a substantial performance gap of 18.1% persists between open- and closed-source LLMs, with even the strongest model (gpt-5) reaching only 73.5% accuracy compared to 90.1% by human. Interestingly, human participants and models struggle on different aspects of Irish grammar, thus highlighting a difference in representation learned by the models. Overall, Irish-BLiMP provides the first systematic framework for evaluating the grammatical competence of LLMs in Irish and offers a valuable benchmark for advancing research on linguistic understanding in low-resource languages.
Javier Alonso Villegas Luis, Marco Antonio Sobrevilla Cabezudo
Linguistic features remain essential for interpretability and tasks that involve style, structure, and readability, but existing Spanish tools offer limited coverage. We present PUCP-Metrix, an open-source and comprehensive toolkit for linguistic analysis of Spanish texts. PUCP-Metrix includes 182 linguistic metrics spanning lexical diversity, syntactic and semantic complexity, cohesion, psycholinguistics, and readability. It enables fine-grained, interpretable text analysis. We evaluate its usefulness on Automated Readability Assessment and Machine-Generated Text Detection, showing competitive performance compared to an existing repository and strong neural baselines. PUCP-Metrix offers a comprehensive and extensible resource for Spanish, supporting diverse NLP applications.
Antonios Dimakis, John Pavlopoulos, Antonios Anastasopoulos
Natural language understanding systems struggle with low-resource languages, including many dialects of high-resource ones. Dialect-to-standard normalization attempts to tackle this issue by transforming dialectal text so that it can be used by standard-language tools downstream. In this study, we tackle this task by introducing a new normalization method that combines rule-based linguistically informed transformations and large language models (LLMs) with targeted few-shot prompting, without requiring any parallel data. We implement our method for Greek dialects and apply it on a dataset of regional proverbs, evaluating the outputs using human annotators. We then use this dataset to conduct downstream experiments, finding that previous results regarding these proverbs relied solely on superficial linguistic information, including orthographic artifacts, while new observations can still be made through the remaining semantics.
This paper explores the interplay of AI language technologies, sign language interpreting, and linguistic access, highlighting the complex interdependencies shaping access frameworks and the tradeoffs these technologies bring. While AI tools promise innovation, they also perpetuate biases, reinforce technoableism, and deepen inequalities through systemic and design flaws. The historical and contemporary privileging of sign language interpreting as the dominant access model, and the broader inclusion ideologies it reflects, shape AIs development and deployment, often sidelining deaf languaging practices and introducing new forms of linguistic subordination to technology. Drawing on Deaf Studies, Sign Language Interpreting Studies, and crip technoscience, this paper critiques the framing of AI as a substitute for interpreters and examines its implications for access hierarchies. It calls for deaf-led approaches to foster AI systems that remain equitable, inclusive, and trustworthy, supporting rather than undermining linguistic autonomy and contributing to deaf aligned futures.
With the evolution of generative linguistic steganography techniques, conventional steganalysis falls short in robustly quantifying the alterations induced by steganography, thereby complicating detection. Consequently, the research paradigm has pivoted towards deep-learning-based linguistic steganalysis. This study offers a comprehensive review of existing contributions and evaluates prevailing developmental trajectories. Specifically, we first provided a formalized exposition of the general formulas for linguistic steganalysis, while comparing the differences between this field and the domain of text classification. Subsequently, we classified the existing work into two levels based on vector space mapping and feature extraction models, thereby comparing the research motivations, model advantages, and other details. A comparative analysis of the experiments is conducted to assess the performances. Finally, the challenges faced by this field are discussed, and several directions for future development and key issues that urgently need to be addressed are proposed.
Image retrieval from contextual descriptions (IRCD) aims to identify an image within a set of minimally contrastive candidates based on linguistically complex text. Despite the success of VLMs, they still significantly lag behind human performance in IRCD. The main challenges lie in aligning key contextual cues in two modalities, where these subtle cues are concealed in tiny areas of multiple contrastive images and within the complex linguistics of textual descriptions. This motivates us to propose ContextBLIP, a simple yet effective method that relies on a doubly contextual alignment scheme for challenging IRCD. Specifically, 1) our model comprises a multi-scale adapter, a matching loss, and a text-guided masking loss. The adapter learns to capture fine-grained visual cues. The two losses enable iterative supervision for the adapter, gradually highlighting the focal patches of a single image to the key textual cues. We term such a way as intra-contextual alignment. 2) Then, ContextBLIP further employs an inter-context encoder to learn dependencies among candidates, facilitating alignment between the text to multiple images. We term this step as inter-contextual alignment. Consequently, the nuanced cues concealed in each modality can be effectively aligned. Experiments on two benchmarks show the superiority of our method. We observe that ContextBLIP can yield comparable results with GPT-4V, despite involving about 7,500 times fewer parameters.
The goal of our research is to automatically retrieve the satisfaction and the frustration in real-life call-center conversations. This study focuses an industrial application in which the customer satisfaction is continuously tracked down to improve customer services. To compensate the lack of large annotated emotional databases, we explore the use of pre-trained speech representations as a form of transfer learning towards AlloSat corpus. Moreover, several studies have pointed out that emotion can be detected not only in speech but also in facial trait, in biological response or in textual information. In the context of telephone conversations, we can break down the audio information into acoustic and linguistic by using the speech signal and its transcription. Our experiments confirms the large gain in performance obtained with the use of pre-trained features. Surprisingly, we found that the linguistic content is clearly the major contributor for the prediction of satisfaction and best generalizes to unseen data. Our experiments conclude to the definitive advantage of using CamemBERT representations, however the benefit of the fusion of acoustic and linguistic modalities is not as obvious. With models learnt on individual annotations, we found that fusion approaches are more robust to the subjectivity of the annotation task. This study also tackles the problem of performances variability and intends to estimate this variability from different views: weights initialization, confidence intervals and annotation subjectivity. A deep analysis on the linguistic content investigates interpretable factors able to explain the high contribution of the linguistic modality for this task.
Tokenization is a critical part of modern NLP pipelines. However, contemporary tokenizers for Large Language Models are based on statistical analysis of text corpora, without much consideration to the linguistic features. I propose a linguistically motivated tokenization scheme, MorphPiece, which is based partly on morphological segmentation of the underlying text. A GPT-style causal language model trained on this tokenizer (called MorphGPT) shows comparable or superior performance on a variety of supervised and unsupervised NLP tasks, compared to the OpenAI GPT-2 model. Specifically I evaluated MorphGPT on language modeling tasks, zero-shot performance on GLUE Benchmark with various prompt templates, massive text embedding benchmark (MTEB) for supervised and unsupervised performance, and lastly with another morphological tokenization scheme (FLOTA, Hoffmann et al., 2022) and find that the model trained on MorphPiece outperforms GPT-2 on most evaluations, at times with considerable margin, despite being trained for about half the training iterations.
Neste estudo, apresentamos uma proposta de material didático autoral (MDA), no formato de unidade didática autoral (UDA), com foco no tema saúde mental na adolescência. O público-alvo, ao qual o MDA foi dirigido, abrange estudantes de língua espanhola do terceiro ano do ensino médio. Para a elaboração do MDA, foram seguidas as etapas propostas por Leffa (2007) e, além disso, foi tomada como base uma série de fontes alimentadoras (estudo piloto, revisão integrativa e de documentos e fundamentação teórica), que sustentaram empírica e teoricamente a sua produção. A partir da produção da UDA, concluímos que o MDA possui viés empírico e base teórica, podendo fornecer contribuições para a sala de aula de língua espanhola.
Yosra Jarrar, Ayodeji O. Awobamise, Gabriel E. Nweke
Body dissatisfaction has become increasingly common among women and young adults and has only become worse in the digital age, where people have increased access to social media and are in constant competition and comparisons with their “friends” on their different social media platforms. While several studies have looked at the relationship between social media and body dissatisfaction, there is an obvious dearth of empirical studies on the mediating role of social anxiety- a gap this study hoped to address. Using a cross-sectional research design, this study examined the mediating role of social anxiety on the relationship between social media usage and body dissatisfaction. The sample consisted of 432 students from Kampala International University and Victoria University in Uganda. The findings show a significant positive relationship between social media usage and body dissatisfaction. The findings prove that heavy users of social media are significantly more likely to suffer from body dissatisfaction. In a similar vein, the findings show that there is a significant positive relationship between social media usage and social anxiety. This suggests that people that frequently make use of social media have a much higher chance of suffering from social anxiety, that is the inability or difficulty to engage in social interactions, than people that rarely or moderately make use of social media. Finally, findings show that social anxiety mediates the relationship between social media usage and body dissatisfaction. It indicates that people with high levels of social anxiety are more likely to suffer from body dissatisfaction as a direct result of heavy social media usage. These findings imply that although heavy users of social media tend to have a more negative perception of their body, if these same users can properly engage in social interactions, then this might mitigate the negative effects of social media usage (in terms of body dissatisfaction).
Students nowadays are no longer enjoy reading printed books. They mostly read from their mobile phone or computers' screens. Teachers need to encourage students to read and comprehend texts. Teachers should adopt and adapt technology in teaching reading comprehension to students in light of the rapidly changing existence of technology. The purpose of this research is to develop appropriate media for learning reading. This research has employed Research and Development design based on ADDIE model. This model is used for media development by requiring 5 stages, namely: 1) Analysis, 2) Design, 3) Development, 4) Implementation, and 5) Evaluation. The research subjects are the tenth graders and the English teacher. Both the media and language experts validated the developed product. The instruments used in the needs analysis are questionnaires for the students and interviews with the English teacher. The final result is in the form of a mobile application that suits the needs of teachers and students. This media consists of 63 slides. The size of this mobile application is 52 Mb. The developed application consists of three main menus and eleven sub-menus. They are the intro page, loading section, menu page, materials page, core competence page, vocabulary building page, learning summary page, exercise page, glossary page, verb form page, user’s guide page, lesson plan page, description of the product page, and exit page. The result of this research could be seen in the enthusiasm, development, and students’ interest in mobile application media. Therefore, it is recommended for teachers to use mobile applications in teaching reading and for other developers to develop applications for the learning process.
Objects in a scene are not always related. The execution efficiency of the one-stage scene graph generation approaches are quite high, which infer the effective relation between entity pairs using sparse proposal sets and a few queries. However, they only focus on the relation between subject and object in triplet set subject entity, predicate entity, object entity, ignoring the relation between subject and predicate or predicate and object, and the model lacks self-reasoning ability. In addition, linguistic modality has been neglected in the one-stage method. It is necessary to mine linguistic modality knowledge to improve model reasoning ability. To address the above-mentioned shortcomings, a Self-reasoning Transformer with Visual-linguistic Knowledge (SrTR) is proposed to add flexible self-reasoning ability to the model. An encoder-decoder architecture is adopted in SrTR, and a self-reasoning decoder is developed to complete three inferences of the triplet set, s+o-p, s+p-o and p+o-s. Inspired by the large-scale pre-training image-text foundation models, visual-linguistic prior knowledge is introduced and a visual-linguistic alignment strategy is designed to project visual representations into semantic spaces with prior knowledge to aid relational reasoning. Experiments on the Visual Genome dataset demonstrate the superiority and fast inference ability of the proposed method.
Theodore R. Sumers, Robert D. Hawkins, Mark K. Ho
et al.
Natural language is an intuitive and expressive way to communicate reward information to autonomous agents. It encompasses everything from concrete instructions to abstract descriptions of the world. Despite this, natural language is often challenging to learn from: it is difficult for machine learning methods to make appropriate inferences from such a wide range of input. This paper proposes a generalization of reward design as a unifying principle to ground linguistic communication: speakers choose utterances to maximize expected rewards from the listener's future behaviors. We first extend reward design to incorporate reasoning about unknown future states in a linear bandit setting. We then define a speaker model which chooses utterances according to this objective. Simulations show that short-horizon speakers (reasoning primarily about a single, known state) tend to use instructions, while long-horizon speakers (reasoning primarily about unknown, future states) tend to describe the reward function. We then define a pragmatic listener which performs inverse reward design by jointly inferring the speaker's latent horizon and rewards. Our findings suggest that this extension of reward design to linguistic communication, including the notion of a latent speaker horizon, is a promising direction for achieving more robust alignment outcomes from natural language supervision.
Análisis de los significantes que construyeron el sentido de una dualidad entre la presencialidad y la virtualidad educativa en Argentina en el 2021. La presencialidad como virtud se asocia a la escuela como espacio ideal para el desarrollo de la psique infantil y juvenil. La virtualidad será criticada en tanto mediación técnica que impide el natural proceso de socialización formador del Yo. Se postula cómo la presencialidad se articula a la educación como ideal nacionalista, propio de la filosofía del progreso iluminista.
Identificador Permanente (ARK): http://id.caicyt.gov.ar/ark:/s18535925/mxvvroiaf
The article is devoted to the description of the main contemporary ways of studying mass media language. Due to its global penetration and influence on all spheres of communication, the research of its functioning becomes the main aspect of foreign and Russian scientific works. The dominating cognitive-discourse paradigm interprets a text in accordance with the communicative situation and cognitive peculiarities of the participants of such communication and gives an opportunity to study mass media texts as a media discourse which includes different aspects of language usage in print and electronic media as well as peculiarities of the channel of communication: television, radio, newspapers, the Internet. As a result, it becomes necessary not only to distinguish new grounds of classification media texts but also to establish the methods of analysis and describe its influence on other spheres of language usage. The main aim of the article is to analyze the most widespread ways of studying mass media language in contemporary Russian linguistics that leads to the understanding of modern tendencies of researching such texts. The author has chosen such trends as media linguistics, media stylistics and discourse-sphere that are the most prominent ways of mass media language interpretation in the scientific society resulting in the great number of articles and monographs not only by the authors of these theories but also by their adherents. The result of this article is the systematic description of these theories, their classification and peculiar methods of investigation.
Durante los últimos años, hemos sido testigos de un incremento en los espacios destinados a los contenidos elaborados por las audiencias en los medios de comunicación y de una creciente integración de estos contenidos en espacios que habían sido hasta ahora reservado a los profesionales. Ello ha despertado el interés de los investigadores, y hoy en día son abundantes, entre otros, los estudios sobre la presencia de mecanismos de participación en los medios de comunicación en línea, la actitud de los periodistas hacia la participación de los usuarios, el análisis de la calidad de la participación de la audiencia y su contribución al desarrollo de una esfera pública más enriquecedora. El objetivo principal de este trabajo es realizar una revisión del estado del arte de la investigación en periodismo participativo. El estudio se centra en el análisis de las principales aproximaciones producidas en esta área. La revisión de la literatura permite observar la evolución en los enfoques utilizados en las investigaciones a los largo de los últimos quince años.