Hasil untuk "Philology. Linguistics"

Menampilkan 20 dari ~631394 hasil · dari CrossRef, DOAJ, arXiv

JSON API
arXiv Open Access 2026
To Write or to Automate Linguistic Prompts, That Is the Question

Marina Sánchez-Torrón, Daria Akselrod, Jason Rauchwerk

LLM performance is highly sensitive to prompt design, yet whether automatic prompt optimization can replace expert prompt engineering in linguistic tasks remains unexplored. We present the first systematic comparison of hand-crafted zero-shot expert prompts, base DSPy signatures, and GEPA-optimized DSPy signatures across translation, terminology insertion, and language quality assessment, evaluating five model configurations. Results are task-dependent. In terminology insertion, optimized and manual prompts produce mostly statistically indistinguishable quality. In translation, each approach wins on different models. In LQA, expert prompts achieve stronger error detection while optimization improves characterization. Across all tasks, GEPA elevates minimal DSPy signatures, and the majority of expert-optimized comparisons show no statistically significant difference. We note that the comparison is asymmetric: GEPA optimization searches programmatically over gold-standard splits, whereas expert prompts require in principle no labeled data, relying instead on domain expertise and iterative refinement.

en cs.CL
arXiv Open Access 2026
Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence

Laurent Bonnasse-Gahot, Christophe Pallier

When humans and large language models (LLMs) process the same text, activations in the LLMs correlate with brain activity measured, e.g., with functional magnetic resonance imaging (fMRI). Moreover, it has been shown that, as the training of an LLM progresses, the performance in predicting brain activity from its internal activations improves more in the left hemisphere than in the right one. The aim of the present work is to understand which kind of competence acquired by the LLMs underlies the emergence of this left-right asymmetry. Using the OLMo-2 7B language model at various training checkpoints and fMRI data from English participants, we compare the evolution of the left-right asymmetry in brain scores alongside performance on several benchmarks. We observe that the asymmetry co-emerges with the formal linguistic abilities of the LLM. These abilities are demonstrated in two ways: by the model's capacity to assign a higher probability to an acceptable sentence than to a grammatically unacceptable one within a minimal contrasting pair, or its ability to produce well-formed text. On the opposite, the left-right asymmetry does not correlate with the performance on arithmetic or Dyck language tasks; nor with text-based tasks involving world knowledge and reasoning. We generalize these results to another family of LLMs (Pythia) and another language, namely French. Our observations indicate that the left-right asymmetry in brain predictivity matches the progress in formal linguistic competence (knowledge of linguistic patterns).

en cs.CL, cs.AI
arXiv Open Access 2026
Understanding Emotion in Discourse: Recognition Insights and Linguistic Patterns for Generation

Cheonkam Jeong, Adeline Nyamathi

Despite strong recent progress in Emotion Recognition in Conversation (ERC), two gaps remain: we lack clear understanding of which modeling choices materially affect performance, and we have limited linguistic analysis linking recognition findings to actionable generation cues. We address both via a systematic study on IEMOCAP. For recognition, we conduct controlled ablations with 10 random seeds and paired tests (with correction for multiple comparisons), yielding three findings. First, conversational context is dominant: performance saturates quickly, with roughly 90% of gain achieved using only the most recent 10-30 preceding turns. Second, hierarchical sentence representations improve utterance-only recognition (K=0), but the benefit vanishes once turn-level context is available, suggesting conversational history subsumes intra-utterance structure. Third, external affective lexicon (SenticNet) integration does not improve results, consistent with pretrained encoders already capturing affective signal. Under strictly causal (past-only) setting, our simple models attain strong performance (82.69% 4-way; 67.07% 6-way weighted F1). For linguistic analysis, we examine 5,286 discourse-marker occurrences and find reliable association between emotion and marker position (p < 0.0001). Sad utterances show reduced left-periphery marker usage (21.9%) relative to other emotions (28-32%), aligning with accounts linking left-periphery markers to active discourse management. This pattern is consistent with Sad benefiting most from conversational context (+22%p), suggesting sadness relies more on discourse history than overt pragmatic signaling.

en cs.CL, cs.AI
DOAJ Open Access 2025
Inebriety of the soul: Aphoristica of passion (semantics, pragmatics, genre properties)

Vorkachev, Sergey Grigorievich

Based on the material of a corpus of short-format statements, the semantics and axiology of passion, as well as the discursive properties of aphorism, are studied. It is established that an aphorism does not have a clear definition based on a single feature and we can only talk about &ldquo;family resemblance&rdquo; as the proximity of some&nbsp;small-format text in terms of a set of features to the ideal of an aphoristic statement. The corpus of aphorisms about passion is dominated by an axiological feature &ndash; every third statement about passion contains its assessment: most often passion is considered as a sin and vice, somewhat less often it receives a positive assessment &ndash; it is considered as spiritual wealth and a source of fruitful activity, even less often a compromise view of assessment is observed: passion is recognized beneficial, when its intensity does not go beyond the limits set by reason and will, and harmful &ndash; when it goes beyond the limits and takes possession of a person. The praxeology of passion in aphorism comes down to recommendations to subordinate passions to your will and learn to control them. Among the distinctive semantic features of passion in aphorism, one can note its omnipotence and insubordination to the will, as well as the innateness and ineradicability of passions. Among the actual distinctive features of aphorism in statements about passion, imagery clearly predominates &ndash; metaphor is present in almost every fourth aphorism, and the dominant type of metaphorical transfer here is pyro/thermometaphor: combustion and temperature fluctuations are reflected in every third metaphor of passion. Another distinctive feature of aphorism &ndash; antithesis is present in every sixth statement about passion, where passion is most often contrasted with reason. One more distinctive feature of aphorism is that paradox in statements about passion appears relatively rarely.

Philology. Linguistics
DOAJ Open Access 2025
Сognitive mechanisms of semantic adaptation of borrowings in Russian

Vadim A. Belov, Valentina M. Belova

This paper investigates the semantic adaptation of new borrowings in Russian, addressing a gap in research on this topic. The relevance of the study stems from the ambiguous and often negative public perception of the increasing number of borrowings in the Russian language. The study aims to classify these borrowings and identify the under­lying causes of the borrowing process. The central hypothesis is that the semantic adaptation of borrowings is determined by the types of cognitive categorization employed. Two types of categorization are described: logical categorization and non-logical categorization. Logical categorization involves the rational structuring of phenomena within a native speaker’s worldview, while non-logical categorization reflects an emotional-evaluative perception of reality, expressing various emotions through borrowings. The research draws on several sources, including the National Corpus of the Russian Language, media publications, Rus­sian dictionaries, statistical data from the Yandex search engine, and the results of a psycho­linguistic experiment involving 106 native speakers. In the experiment, participants were tasked with interpreting stimulus words, providing insights into how borrowings are unders­tood and categorized. The study's primary outcome is a typology of borrowings, differentiated by the organization of their semantic relationships. The typology includes borrowings that denote new phenomena in reality, borrowings used as substitutes for synonymous native phrases, and borrowing-doublets. Logical categorization establishes hierarchical semantic connections for borrowings, while non-logical categorization conveys emotional and eva­luative attitudes. The findings indicate that logical categorization is predominant for most bor­rowings, whereas non-logical categorization applies to a smaller subset. In some cases, these types of categorization can co-occur. The study concludes that cognitive categorization plays a crucial role in the semantic adaptation of borrowings, offering new insights into their integration into the Russian language.

Philology. Linguistics
DOAJ Open Access 2025
“Onde você mora?”: um estudo hodonímico dos logradouros do município de Farroupilha/RS

Jaqueline Biazus, Kleber Eckert

Conhecer e entender as motivações da escolha de certas denominações para nomear determinado lugar é uma maneira de estudar os fatores sociais, culturais e históricos que dele fazem parte e, portanto, de preservar memórias significativas e particulares do povo que nele habita. Sendo assim, o principal objetivo deste artigo é analisar, sob a perspectiva histórica, linguística e sociocultural, os nomes dados às ruas, avenidas e travessas do município de Farroupilha, localizado no Rio Grande do Sul. Para isso, é realizado um estudo documental dos aspectos históricos e socioculturais, com ênfase nos processos migratórios, bem como nas características atuais do município de Farroupilha, ou seja, o processo de ocupação do território por imigrantes italianos no século XIX até o desenvolvimento da cidade na contemporaneidade; uma análise bibliográfica da área da toponímia a partir de autores como Dauzat (1947), Seabra (2006), Marcato (2009), Frosi (2009), Zamariano (2012) e Isquerdo (2019), principalmente; a classificação dos hodônimos conforme taxionomias previamente definidas, com base em Dick (1990), e uma análise da motivação decorrente da denominação de cada logradouro. Após a realização de cada etapa citada anteriormente, chegou-se a algumas conclusões: a administração pública valoriza quase com exclusividade as pessoas locais no ato de nomear, ou seja, as pessoas que contribuíram para a construção e o desenvolvimento do município; e, também, estampa a história e a cultura, especialmente a dos imigrantes italianos, nos nomes dados aos logradouros. Por fim, este estudo contribui para a construção de conhecimento sobre a microtoponímia urbana na Região de Colonização Italiana do Rio Grande do Sul e apresenta uma interpretação dos nomes dos logradouros para a comunidade pesquisada.

Auxiliary sciences of history, Philology. Linguistics
arXiv Open Access 2025
Profiling News Media for Factuality and Bias Using LLMs and the Fact-Checking Methodology of Human Experts

Zain Muhammad Mujahid, Dilshod Azizov, Maha Tufail Agro et al.

In an age characterized by the proliferation of mis- and disinformation online, it is critical to empower readers to understand the content they are reading. Important efforts in this direction rely on manual or automatic fact-checking, which can be challenging for emerging claims with limited information. Such scenarios can be handled by assessing the reliability and the political bias of the source of the claim, i.e., characterizing entire news outlets rather than individual claims or articles. This is an important but understudied research direction. While prior work has looked into linguistic and social contexts, we do not analyze individual articles or information in social media. Instead, we propose a novel methodology that emulates the criteria that professional fact-checkers use to assess the factuality and political bias of an entire outlet. Specifically, we design a variety of prompts based on these criteria and elicit responses from large language models (LLMs), which we aggregate to make predictions. In addition to demonstrating sizable improvements over strong baselines via extensive experiments with multiple LLMs, we provide an in-depth error analysis of the effect of media popularity and region on model performance. Further, we conduct an ablation study to highlight the key components of our dataset that contribute to these improvements. To facilitate future research, we released our dataset and code at https://github.com/mbzuai-nlp/llm-media-profiling.

en cs.CL, cs.AI
arXiv Open Access 2025
A Taxonomy of Linguistic Expressions That Contribute To Anthropomorphism of Language Technologies

Alicia DeVrio, Myra Cheng, Lisa Egede et al.

Recent attention to anthropomorphism -- the attribution of human-like qualities to non-human objects or entities -- of language technologies like LLMs has sparked renewed discussions about potential negative impacts of anthropomorphism. To productively discuss the impacts of this anthropomorphism and in what contexts it is appropriate, we need a shared vocabulary for the vast variety of ways that language can be anthropomorphic. In this work, we draw on existing literature and analyze empirical cases of user interactions with language technologies to develop a taxonomy of textual expressions that can contribute to anthropomorphism. We highlight challenges and tensions involved in understanding linguistic anthropomorphism, such as how all language is fundamentally human and how efforts to characterize and shift perceptions of humanness in machines can also dehumanize certain humans. We discuss ways that our taxonomy supports more precise and effective discussions of and decisions about anthropomorphism of language technologies.

en cs.HC, cs.AI
arXiv Open Access 2024
Concurrent Linguistic Error Detection (CLED): a New Methodology for Error Detection in Large Language Models

Jinhua Zhu, Javier Conde, Zhen Gao et al.

The wide adoption of Large language models (LLMs) makes their dependability a pressing concern. Detection of errors is the first step to mitigating their impact on a system and thus, efficient error detection for LLMs is an important issue. In many settings, the LLM is considered as a black box with no access to the internal nodes; this prevents the use of many error detection schemes that need access to the model's internal nodes. An interesting observation is that the output of LLMs in error-free operation should be valid and normal text. Therefore, when the text is not valid or differs significantly from normal text, it is likely that there is an error. Based on this observation we propose to perform Concurrent Linguistic Error Detection (CLED); this scheme extracts some linguistic features of the text generated by the LLM and feeds them to a concurrent classifier that detects errors. Since the proposed error detection mechanism only relies on the outputs of the model, then it can be used on LLMs in which there is no access to the internal nodes. The proposed CLED scheme has been evaluated on the T5 model when used for news summarization and on the OPUS-MT model when used for translation. In both cases, the same set of linguistic features has been used for error detection to illustrate the applicability of the proposed scheme beyond a specific case. The results show that CLED can detect most of the errors at a low overhead penalty. The use of the concurrent classifier also enables a trade-off between error detection effectiveness and its associated overhead, so providing flexibility to a designer.

en cs.AI, cs.CL
arXiv Open Access 2024
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering

Sacha Muller, António Loison, Bilel Omrani et al.

Retrieval-Augmented Generation (RAG) has emerged as a common paradigm to use Large Language Models (LLMs) alongside private and up-to-date knowledge bases. In this work, we address the challenges of using LLM-as-a-Judge when evaluating grounded answers generated by RAG systems. To assess the calibration and discrimination capabilities of judge models, we identify 7 generator failure modes and introduce GroUSE (Grounded QA Unitary Scoring of Evaluators), a meta-evaluation benchmark of 144 unit tests. This benchmark reveals that existing automated RAG evaluation frameworks often overlook important failure modes, even when using GPT-4 as a judge. To improve on the current design of automated RAG evaluation frameworks, we propose a novel pipeline and find that while closed models perform well on GroUSE, state-of-the-art open-source judges do not generalize to our proposed criteria, despite strong correlation with GPT-4's judgement. Our findings suggest that correlation with GPT-4 is an incomplete proxy for the practical performance of judge models and should be supplemented with evaluations on unit tests for precise failure mode detection. We further show that finetuning Llama-3 on GPT-4's reasoning traces significantly boosts its evaluation capabilities, improving upon both correlation with GPT-4's evaluations and calibration on reference situations.

en cs.CL
arXiv Open Access 2024
Ignore Me But Don't Replace Me: Utilizing Non-Linguistic Elements for Pretraining on the Cybersecurity Domain

Eugene Jang, Jian Cui, Dayeon Yim et al.

Cybersecurity information is often technically complex and relayed through unstructured text, making automation of cyber threat intelligence highly challenging. For such text domains that involve high levels of expertise, pretraining on in-domain corpora has been a popular method for language models to obtain domain expertise. However, cybersecurity texts often contain non-linguistic elements (such as URLs and hash values) that could be unsuitable with the established pretraining methodologies. Previous work in other domains have removed or filtered such text as noise, but the effectiveness of these methods have not been investigated, especially in the cybersecurity domain. We propose different pretraining methodologies and evaluate their effectiveness through downstream tasks and probing tasks. Our proposed strategy (selective MLM and jointly training NLE token classification) outperforms the commonly taken approach of replacing non-linguistic elements (NLEs). We use our domain-customized methodology to train CyBERTuned, a cybersecurity domain language model that outperforms other cybersecurity PLMs on most tasks.

en cs.CR, cs.CL
arXiv Open Access 2024
LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages

Andrew M. Bean, Simi Hellsten, Harry Mayne et al.

In this paper, we present the LingOly benchmark, a novel benchmark for advanced reasoning abilities in large language models. Using challenging Linguistic Olympiad puzzles, we evaluate (i) capabilities for in-context identification and generalisation of linguistic patterns in very low-resource or extinct languages, and (ii) abilities to follow complex task instructions. The LingOly benchmark covers more than 90 mostly low-resource languages, minimising issues of data contamination, and contains 1,133 problems across 6 formats and 5 levels of human difficulty. We assess performance with both direct accuracy and comparison to a no-context baseline to penalise memorisation. Scores from 11 state-of-the-art LLMs demonstrate the benchmark to be challenging, and models perform poorly on the higher difficulty problems. On harder problems, even the top model only achieved 38.7% accuracy, a 24.7% improvement over the no-context baseline. Large closed models typically outperform open models, and in general, the higher resource the language, the better the scores. These results indicate, in absence of memorisation, true multi-step out-of-domain reasoning remains a challenge for current language models.

en cs.CL
arXiv Open Access 2023
A Predictive Model of Digital Information Engagement: Forecasting User Engagement With English Words by Incorporating Cognitive Biases, Computational Linguistics and Natural Language Processing

Nimrod Dvir, Elaine Friedman, Suraj Commuri et al.

This study introduces and empirically tests a novel predictive model for digital information engagement (IE) - the READ model, an acronym for the four pivotal attributes of engaging information: Representativeness, Ease-of-use, Affect, and Distribution. Conceptualized within the theoretical framework of Cumulative Prospect Theory, the model integrates key cognitive biases with computational linguistics and natural language processing to develop a multidimensional perspective on information engagement. A rigorous testing protocol was implemented, involving 50 randomly selected pairs of synonymous words (100 words in total) from the WordNet database. These words' engagement levels were evaluated through a large-scale online survey (n = 80,500) to derive empirical IE metrics. The READ attributes for each word were then computed and their predictive efficacy examined. The findings affirm the READ model's robustness, accurately predicting a word's IE level and distinguishing the more engaging word from a pair of synonyms with an 84% accuracy rate. The READ model's potential extends across various domains, including business, education, government, and healthcare, where it could enhance content engagement and inform AI language model development and generative text work. Future research should address the model's scalability and adaptability across different domains and languages, thereby broadening its applicability and efficacy.

en cs.HC, cs.CL
arXiv Open Access 2023
Mathematical and Linguistic Characterization of Orhan Pamuk's Nobel Works

Taner Arsan, Sehnaz Sismanoglu Simsek, Onder Pekcan

In this study, Nobel Laureate Orhan Pamuk's works are chosen as examples of Turkish literature. By counting the number of letters and words in his texts, we find it possible to study his works statistically. It has been known that there is a geometrical order in text structures. Here the method based on the basic assumption of fractal geometry is introduced for calculating the fractal dimensions of Pamuk's texts. The results are compared with the applications of Zipf's law, which is successfully applied for letters and words, where two concepts, namely Zipf's dimension and Zipf's order, are introduced. The Zipf dimension of the novel My Name is Red is found to be much different than his other novels. However, it is linguistically observed that there is no fundamental difference between his corpora. The results are interpreted in terms of fractal dimensions and the Turkish language.

en cs.CL
arXiv Open Access 2021
Not All Models Localize Linguistic Knowledge in the Same Place: A Layer-wise Probing on BERToids' Representations

Mohsen Fayyaz, Ehsan Aghazadeh, Ali Modarressi et al.

Most of the recent works on probing representations have focused on BERT, with the presumption that the findings might be similar to the other models. In this work, we extend the probing studies to two other models in the family, namely ELECTRA and XLNet, showing that variations in the pre-training objectives or architectural choices can result in different behaviors in encoding linguistic information in the representations. Most notably, we observe that ELECTRA tends to encode linguistic knowledge in the deeper layers, whereas XLNet instead concentrates that in the earlier layers. Also, the former model undergoes a slight change during fine-tuning, whereas the latter experiences significant adjustments. Moreover, we show that drawing conclusions based on the weight mixing evaluation strategy -- which is widely used in the context of layer-wise probing -- can be misleading given the norm disparity of the representations across different layers. Instead, we adopt an alternative information-theoretic probing with minimum description length, which has recently been proven to provide more reliable and informative results.

en cs.CL, cs.AI
arXiv Open Access 2021
How is BERT surprised? Layerwise detection of linguistic anomalies

Bai Li, Zining Zhu, Guillaume Thomas et al.

Transformer language models have shown remarkable ability in detecting when a word is anomalous in context, but likelihood scores offer no information about the cause of the anomaly. In this work, we use Gaussian models for density estimation at intermediate layers of three language models (BERT, RoBERTa, and XLNet), and evaluate our method on BLiMP, a grammaticality judgement benchmark. In lower layers, surprisal is highly correlated to low token frequency, but this correlation diminishes in upper layers. Next, we gather datasets of morphosyntactic, semantic, and commonsense anomalies from psycholinguistic studies; we find that the best performing model RoBERTa exhibits surprisal in earlier layers when the anomaly is morphosyntactic than when it is semantic, while commonsense anomalies do not exhibit surprisal at any intermediate layer. These results suggest that language models employ separate mechanisms to detect different types of linguistic anomalies.

en cs.CL
DOAJ Open Access 2020
#T1DLooksLikeMe: Exploring Self-Disclosure, Social Support, and Type 1 Diabetes on Instagram

Bree E. Holtz, Shaheen Kanthawala

Type 1 diabetes (T1D) is diagnosed mostly during childhood or adolescent years. During such transitional phases of life, having support from others in similar situations can reduce feelings of isolation and loneliness. Instagram, a platform with high use among teens or young adults, acts as an alternative to traditional online health communities. To better understand individuals self-disclosure on Instagram related to T1D, we conducted an exploratory quantitative content analysis of a sample of 423 posts using the hashtag #t1dlookslikeme. These posts were collected using Netlytic between July—October 2018. Our research questions asked about the types of hashtags used, the content of the images, the sentiments of the posts, the relationship between post engagement and post sentiment, if and how the posts represented self-disclosure, and the presence of social support. A codebook containing 43 items on the image and 10 codes for captions was created for this study, and all data were analyzed using SPSS. Our dataset included 89% images compared to 6.4% video clips. Additionally, 83.5% of the posts were personal images whereas 11.6% were categorized as memes. We noted the most popular hashtags, and other characteristics of the images used by individuals to self-disclose their T1D. Overall, our random sample contained more positive sentiment posts rather than negative—and these positive sentiment posts were correlated with a higher number of hashtags in each post. Indicating a possible connection between self-disclosure and positive sentiment. This finding also reflected elements of empowerment (such as taking the “power” away from T1D and returning it to themselves), which is also discussed.

Communication. Mass media
arXiv Open Access 2020
AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

Wenhai Wang, Xuebo Liu, Xiaozhong Ji et al.

Scene text spotting aims to detect and recognize the entire word or sentence with multiple characters in natural images. It is still challenging because ambiguity often occurs when the spacing between characters is large or the characters are evenly spread in multiple rows and columns, making many visually plausible groupings of the characters (e.g. "BERLIN" is incorrectly detected as "BERL" and "IN" in Fig. 1(c)). Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection. The proposed AE TextSpotter has three important benefits. 1) The linguistic representation is learned together with the visual representation in a framework. To our knowledge, it is the first time to improve text detection by using a language model. 2) A carefully designed language module is utilized to reduce the detection confidence of incorrect text lines, making them easily pruned in the detection stage. 3) Extensive experiments show that AE TextSpotter outperforms other state-of-the-art methods by a large margin. For example, we carefully select a validation set of extremely ambiguous samples from the IC19-ReCTS dataset, where our approach surpasses other methods by more than 4%. The code has been released at https://github.com/whai362/AE_TextSpotter. The image list and evaluation scripts of the validation set have been released at https://github.com/whai362/TDA-ReCTS.

en cs.CV
DOAJ Open Access 2019
A READER RESPONSE APPROACH IN COLLABORATIVE READING PROJECTS TO FOSTER CRITICAL THINKING SKILLS

Truly Almendo Pasaribu, Yuseva Ariyani Iswandari

Reading has become a major concern of EFL educators. Reading does not only help students learn foreign languages, but it is also believed that it has a strong link with critical thinking skills. A reader response approach in collaborative works, adapted from literary theory, is believed to be beneficial for the students. Therefore, this study aims at investigating the answers to these two questions: (1) how are the collaborative reader responses implemented in Critical Reading and Writing II? and (2) To what extent does reader response approaches promote students critical thinking skills? With these questions in mind, the researchers collect the data by involving 24 participants from CRW II (Critical Reading and Writing) class. The data gathered from classroom observations, online archives and students reflections are analyzed descriptively, using qualitative case study method. It is hoped that the implementation of this approach can be useful not only to improve students reading skills, but also to provide more opportunity for students to exercise their critical thinking skills.

Education (General), Language and Literature

Halaman 48 dari 31570