Hasil "Philology. Linguistics"

S2 Open Access 2002

Corpus Linguistics at Work

E. Tognini-Bonelli

1337 sitasi en Computer Science

arXiv Open Access 2026

Early Linguistic Pattern of Anxiety from Social Media Using Interpretable Linguistic Features: A Multi-Faceted Validation Study with Author-Disjoint Evaluation

Arnab Das Utsa

Anxiety affects hundreds of millions of individuals globally, yet large-scale screening remains limited. Social media language provides an opportunity for scalable detection, but current models often lack interpretability, keyword-robustness validation, and rigorous user-level data integrity. This work presents a transparent approach to social media-based anxiety detection through linguistically interpretable feature-grounded modeling and cross-domain validation. Using a substantial dataset of Reddit posts, we trained a logistic regression classifier on carefully curated subreddits for training, validation, and test splits. Comprehensive evaluation included feature ablation, keyword masking experiments, and varying-density difference analyses comparing anxious and control groups, along with external validation using clinically interviewed participants with diagnosed anxiety disorders. The model achieved strong performance while maintaining high accuracy even after sentiment removal or keyword masking. Early detection using minimal post history significantly outperformed random classification, and cross-domain analysis demonstrated strong consistency with clinical interview data. Results indicate that transparent linguistic features can support reliable, generalizable, and keyword-robust anxiety detection. The proposed framework provides a reproducible baseline for interpretable mental health screening across diverse online contexts.

en cs.CL, cs.AI

Detail Sumber

S2 Open Access 2026

Makhov A. E. Selected Works: in 3 vols. Tula: Aquarius, 2025. Vol. 2. On the Music of the Word / compiled, edited, and introduced by O. L. Dovgiy. 780 p.: Book review

S. A. Makarova

The review presents an analysis of the book “On the Music of the Word”, included in the collection of selected works by the prominent Soviet and Russian humanities scholar, Doctor of Philology, Professor at the Russian State University for the Humanities, and leading researcher at the Institute of Scientific Information on Social Sciences of the Russian Academy of Sciences and the Gorky Institute of World Literature of the Russian Academy of Sciences, Alexander Evgenievich Makhov (1959-2021). The book, structured chronologically, brings together interdisciplinary materials of various genres: the monograph “Musica Literaria: The Idea of Verbal Music in European Poetics”; the doctoral dissertation “The System of Musicological Concepts and Terms in the History of European Poetics” along with documents related to its defense; articles from various years; and the plan for the book “Poetics and Music”. These works offer insights not only into the centuries-old relationship between word and music and the various facets of A. E. Makhov’s concept of verbal musicality but also into the stages of their formation and development. The edition is intended for specialists in literary theory and history, linguistics, musicology, philosophy, aesthetics, and cultural studies, as well as a wide range of readers.

en

Detail DOI Sumber

S2 Open Access 2025

DYNAMICS OF LINGUISTIC CONCEPTS OF THE DEPARTMENT OF RUSSIAN LANGUAGE OF MPGU IN THE LIGHT OF CHANGING SCIENTIFIC PARADIGMS (TO THE 90TH ANNIVERSARY OF THE RE-ESTABLISHMENT OF THE DEPARTMENT)

S. Kolesnikova, A. Gryaznova

The academic traditions of the Russian Language Department of the Institute of Philology of Moscow Pedagogical State University are closely connected with Russian linguistic concepts and the whole history of the university. Between 1872 and 1935, Russian philology defined the academic concept of the Russian language as a social phenomenon, an independent discipline, a sphere of academic knowledge and a means of professional training of future teachers, its role in teaching school disciplines and personality formation. The foundation of the Department’s linguistic outlook was formed by the ideas of the teaching staff of Moscow Higher Women’s Courses, where the Russian language had an interdisciplinary character: the Russian language at the Department was studied in close interrelation with literature in line with the comparative-historical philological paradigm prevailing at that time. Prominent Soviet philologists (P.N. Sakulin, A.M. Peshkovsky, N.M. Karinsky, A.M. Selischev, M.N. Peterson, D.N. Ushakov, G.O. Vinokur, V.V. Vinogradov), who studied the Russian language based on the structural-system approach with the use of statistical and formal-grammatical methods, worked at the Literature and Linguistics Department of the Pedagogical Faculty of the 2nd Moscow State University at different times. paradigm: for the study of cognitive science, functional potential of grammar, discourse studies, psycholinguistics and other integrative sciences. The ideas presented in the works of the greatest Russian dialectologists A.M. Selischev and N.M. Karinsky on comparative-historical grammar and onomastics remain important. During the study of syntactic units, statistical and formal-grammatical methods were applied. The modern academic linguistic paradigm was formed under the direct influence of the concepts developed by the professorial staff of the MVZhK MGPI, and includes the range of problems discussed today by the representatives of the Russian Language Department of the Institute of Philology of Moscow Pedagogical State University. The continuity of academic knowledge is reflected in new monographs and educational publications by the teaching staff of the department in co-operation with other Russian scholars.

1 sitasi en

Detail DOI Sumber

S2 Open Access 2025

Why “Real men don't speak French”: Deconstructing cultural attitudes to a language by historicizing their discursive formations

S. Coffey

Guided by Foucault's concept of “discursive formations,” the study reported here draws on primary archival and secondary source material to examine how French has been discursively shaped in England and in relation to English. Unpacking sociohistorical constructions of sameness–difference offers a productive frame to explore ideological positionings in new, interdisciplinary ways that have thus far been underdeveloped in applied linguistics. The study historicizes attitudes to French in England from the 16th century, a time characterized by the coupling of language and nation that has echoed down the ages voiced as received wisdom. While French remained the dominant European vernacular during the early modern period, French in England was increasingly framed as a threat against increasingly nationalist, patriarchal models of language, whereby mythologizing histories positioned French as florid and effete in opposition to plain, manly Saxon English. Not only were boys and girls encouraged to learn different versions of French (different content and different skills) but racialized philology also sought to expunge the etymologically French fabric from English. Learning foreign languages, even the adoption of loan terms, was fraught with the risk of pretentiously identifying too strongly with the other and of disidentifying with home and nation.

1 sitasi en

Detail DOI Sumber

arXiv Open Access 2025

Semantic Anchoring in Agentic Memory: Leveraging Linguistic Structures for Persistent Conversational Context

Maitreyi Chatterjee, Devansh Agarwal

Large Language Models (LLMs) have demonstrated impressive fluency and task competence in conversational settings. However, their effectiveness in multi-session and long-term interactions is hindered by limited memory persistence. Typical retrieval-augmented generation (RAG) systems store dialogue history as dense vectors, which capture semantic similarity but neglect finer linguistic structures such as syntactic dependencies, discourse relations, and coreference links. We propose Semantic Anchoring, a hybrid agentic memory architecture that enriches vector-based storage with explicit linguistic cues to improve recall of nuanced, context-rich exchanges. Our approach combines dependency parsing, discourse relation tagging, and coreference resolution to create structured memory entries. Experiments on adapted long-term dialogue datasets show that semantic anchoring improves factual recall and discourse coherence by up to 18% over strong RAG baselines. We further conduct ablation studies, human evaluations, and error analysis to assess robustness and interpretability.

en cs.CL

Detail Sumber

arXiv Open Access 2025

Breaking Physical and Linguistic Borders: Multilingual Federated Prompt Tuning for Low-Resource Languages

Wanru Zhao, Yihong Chen, Royson Lee et al.

Pre-trained large language models (LLMs) have become a cornerstone of modern natural language processing, with their capabilities extending across a wide range of applications and languages. However, the fine-tuning of multilingual LLMs, especially for low-resource languages, faces significant challenges arising from data-sharing restrictions (the physical border) and inherent linguistic differences (the linguistic border). These barriers hinder users of various languages, particularly those in low-resource regions, from fully benefiting from the advantages of LLMs. To address these challenges, we propose the Federated Prompt Tuning Paradigm for multilingual scenarios, which utilizes parameter-efficient fine-tuning while adhering to data sharing restrictions. We design a comprehensive set of experiments and analyze them using a novel notion of language distance to highlight the strengths of our paradigm: Even under computational constraints, our method not only improves data efficiency but also facilitates mutual enhancements across languages, particularly benefiting low-resource ones. Compared to traditional local cross-lingual transfer tuning methods, our approach achieves 6.9\% higher accuracy with improved data efficiency, and demonstrates greater stability and generalization. These findings underscore the potential of our approach to promote social equality and champion linguistic diversity, ensuring that no language is left behind.

en cs.CL

Detail Sumber

arXiv Open Access 2025

MathRobust-LV: Evaluation of Large Language Models' Robustness to Linguistic Variations in Mathematical Reasoning

Neeraja Kirtane, Yuvraj Khanna, Peter Relan

Large language models excel on math benchmarks, but their math reasoning robustness to linguistic variation is underexplored. While recent work increasingly treats high-difficulty competitions like the IMO as the gold standard for evaluating reasoning, we believe in comprehensive benchmarking of high school-level math problems in real educational settings. We introduce MathRobust-LV, a test set and evaluation methodology that mirrors how instructors rephrase problems across assessments while keeping difficulty constant: we change surface details (names, contexts, variables) while preserving numerical structure and answers. In contrast to prior efforts that alter problem content or emphasize IMO-level tasks, we focus on high-school-level dataset problems at the difficulty level where models are currently deployed in educational settings: tutoring and assessment systems. In these applications, instructors rephrase identical concepts in varied ways, making linguistic robustness essential for reliable deployment. Although MATH data benchmarking is often regarded as saturated, our experiment on 34 models reveals that accuracy declines when moving from the baseline to the variants. These drops are severe for smaller models (9-11%) while stronger models also show measurable degradation. Frontier models like GPT-5, Gemini-2.5pro remain comparatively stable. Our results highlight that robustness to linguistic variation is a fundamental challenge, exposing reasoning vulnerabilities in models.

en cs.CL

Detail Sumber

S2 Open Access 2025

Mediadiscourse: The concept, approaches to studying, research perspectives

M. A. Davydova

This paper offers a media discourse conceptualization through a variety of theoretical perspectives and approaches. The emphasis is placed on examining media discourse through the prism of the communicative and interdisciplinary approaches. The aim of the article is to reveal the concept of media discourse in the aspect of communication theory principles, to review traditional approaches to its study and to identify the prospects for further research, offering new angles of the media text interdisciplinary analysis. The article is theoretical in nature, the main research methods are general scientific ones: a theoretical analysis, an abstract presentation with generalization and heuristic elements, the analysis and synthesis of theoretical data, the comparison, contrast and descriptive methods. The object of study is media discourse, the subject is its characteristic properties, the approaches to its definition in different fields of knowledge, primarily in philology. As a result, we identify promising directions of media discourse research and its characteristic features – dialogicality, “new” colloquiality, multimodality, discursiveness and global contextuality. The scientific novelty of the research is its orientation towards the communicative and interdisciplinary approach and the prospect of studying this object. Being a complex new phenomenon that is still developing, media discourse cannot be studied by the tools of one isolated field of knowledge (for example, linguistics), because it is a voluminous and highly complex structure that includes, in addition to language, situation, event, attitude to it, political narratives, social stereotypes and historical references. The practical significance of the study is the presentation of new approaches to the study of media discourse. The study of media discourse offers broad prospects for use in media education, political communication and communication-oriented research.

en

Detail DOI Sumber

arXiv Open Access 2024

Human behaviour through a LENS: How Linguistic content triggers Emotions and Norms and determines Strategy choices

Valerio Capraro

Over the last two decades, a growing body of experimental research has provided evidence that linguistic frames influence human behaviour in economic games, beyond the economic consequences of the available actions. This article proposes a novel framework that transcends the traditional confines of outcome-based preference models. According to the LENS model, the Linguistic description of the decision problem triggers Emotional responses and suggests potential Norms of behaviour, which then interact to shape an individual's Strategic choice. The article reviews experimental evidence that supports each path of the LENS model. Furthermore, it identifies and discusses several critical research questions that arise from this model, pointing towards avenues for future inquiry.

en cs.CL, cs.GT

Detail Sumber

DOAJ Open Access 2024

Are We Replicating Yet? Reproduction and Replication in Communication Research

Johannes Breuer, Mario Haim

The replication crisis has highlighted the importance of reproducibility and replicability in the social and behavioral sciences, including in communication research. While there have been some discussions of and studies on replications in communication research, the extent of this work is significantly lower than in psychology. The key reasons for this limitation are the differences between the disciplines in the topics commonly studied and in the methods and data commonly used in communication research. Communication research often investigates dynamic topics and uses methods (e.g., content analysis) and data types (e.g., media content and social media data) that are not used, or, at least, are much less frequently used, in other fields. These specific characteristics of communication research must be considered and require a more nuanced understanding of reproducibility and replicability. This thematic issue includes commentaries presenting different perspectives, as well as methodological and empirical work investigating the reproducibility and replicability of a wide range of communication research, including surveys, experiments, systematic literature reviews, and studies that involve social media or audio data. The articles in this issue acknowledge the diversity and unique features of communication research and present various ways of improving its reproducibility and replicability, as well as our understanding thereof.

Communication. Mass media

Detail DOI Sumber

CrossRef Open Access 2023

Revealing attitude toward statistics among MA TEFL students: A Systemic Functional Linguistics perspective

Joanna Pitura

Higher education students’ attitudes toward statistics are of great importance as these may affect student performance and achievement in statistics learning. Despite some interest in attitude towards statistics among higher education students in language-related fields and the broader area of tertiary-level education, the different types of existing attitudes toward statistics might remain uncovered. Applying the Attitude system as an analytical framework, this study explores how higher education students use linguistic resources to indicate statistics-related attitude before, during and after learning. The narratives of learning statistics were obtained from a group of MA TEFL students (n=25) who participated in an introductory course on basic statistical concepts and procedures, among others. The study makes visible a great variety and a considerable variation of linguistic means which students use to express statistics-related attitude as Judgement, Appreciation and Affect, emerging over time. As such, this study advances methodological practices and training in the field.

en

Detail DOI Sumber

arXiv Open Access 2023

The language of prompting: What linguistic properties make a prompt successful?

Alina Leidinger, Robert van Rooij, Ekaterina Shutova

The latest generation of LLMs can be prompted to achieve impressive zero-shot or few-shot performance in many NLP tasks. However, since performance is highly sensitive to the choice of prompts, considerable effort has been devoted to crowd-sourcing prompts or designing methods for prompt optimisation. Yet, we still lack a systematic understanding of how linguistic properties of prompts correlate with task performance. In this work, we investigate how LLMs of different sizes, pre-trained and instruction-tuned, perform on prompts that are semantically equivalent, but vary in linguistic structure. We investigate both grammatical properties such as mood, tense, aspect and modality, as well as lexico-semantic variation through the use of synonyms. Our findings contradict the common assumption that LLMs achieve optimal performance on lower perplexity prompts that reflect language use in pretraining or instruction-tuning data. Prompts transfer poorly between datasets or models, and performance cannot generally be explained by perplexity, word frequency, ambiguity or prompt length. Based on our results, we put forward a proposal for a more robust and comprehensive evaluation standard for prompting research.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2022

Acoustic-Linguistic Features for Modeling Neurological Task Score in Alzheimer's

Saurav K. Aryal, Howard Prioleau, Legand Burge

The average life expectancy is increasing globally due to advancements in medical technology, preventive health care, and a growing emphasis on gerontological health. Therefore, developing technologies that detect and track aging-associated disease in cognitive function among older adult populations is imperative. In particular, research related to automatic detection and evaluation of Alzheimer's disease (AD) is critical given the disease's prevalence and the cost of current methods. As AD impacts the acoustics of speech and vocabulary, natural language processing and machine learning provide promising techniques for reliably detecting AD. We compare and contrast the performance of ten linear regression models for predicting Mini-Mental Status Exam scores on the ADReSS challenge dataset. We extracted 13000+ handcrafted and learned features that capture linguistic and acoustic phenomena. Using a subset of 54 top features selected by two methods: (1) recursive elimination and (2) correlation scores, we outperform a state-of-the-art baseline for the same task. Upon scoring and evaluating the statistical significance of each of the selected subset of features for each model, we find that, for the given task, handcrafted linguistic features are more significant than acoustic and learned features.

en cs.CE, cs.CL

Detail Sumber

arXiv Open Access 2022

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan et al.

While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world. Most visual description methods are known to capture and exploit patterns in the training data leading to evaluation metric increases, but what are those patterns? In this work, we examine several popular visual description datasets, and capture, analyze, and understand the dataset-specific linguistic patterns that models exploit but do not generalize to new domains. At the token level, sample level, and dataset level, we find that caption diversity is a major driving factor behind the generation of generic and uninformative captions. We further show that state-of-the-art models even outperform held-out ground truth captions on modern metrics, and that this effect is an artifact of linguistic diversity in datasets. Understanding this linguistic diversity is key to building strong captioning models, we recommend several methods and approaches for maintaining diversity in the collection of new data, and dealing with the consequences of limited diversity when using current models and metrics.

en cs.CV, cs.CL

Detail Sumber

arXiv Open Access 2022

MASALA: Modelling and Analysing the Semantics of Adpositions in Linguistic Annotation of Hindi

Aryaman Arora, Nitin Venkateswaran, Nathan Schneider

We present a completed, publicly available corpus of annotated semantic relations of adpositions and case markers in Hindi. We used the multilingual SNACS annotation scheme, which has been applied to a variety of typologically diverse languages. Building on past work examining linguistic problems in SNACS annotation, we use language models to attempt automatic labelling of SNACS supersenses in Hindi and achieve results competitive with past work on English. We look towards upstream applications in semantic role labelling and extension to related languages such as Gujarati.

en cs.CL

Detail Sumber

DOAJ Open Access 2022

Joining the Dialogue: Practices for Ethical Research Writing. Bettina Stumm. Broadview Press, 2021

Andreas Herzog

Discourse analysis

Detail DOI Sumber

arXiv Open Access 2021

Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights

Devaraja Adiga, Rishabh Kumar, Amrith Krishna et al.

Automatic speech recognition (ASR) in Sanskrit is interesting, owing to the various linguistic peculiarities present in the language. The Sanskrit language is lexically productive, undergoes euphonic assimilation of phones at the word boundaries and exhibits variations in spelling conventions and in pronunciations. In this work, we propose the first large scale study of automatic speech recognition (ASR) in Sanskrit, with an emphasis on the impact of unit selection in Sanskrit ASR. In this work, we release a 78 hour ASR dataset for Sanskrit, which faithfully captures several of the linguistic characteristics expressed by the language. We investigate the role of different acoustic model and language model units in ASR systems for Sanskrit. We also propose a new modelling unit, inspired by the syllable level unit selection, that captures character sequences from one vowel in the word to the next vowel. We also highlight the importance of choosing graphemic representations for Sanskrit and show the impact of this choice on word error rates (WER). Finally, we extend these insights from Sanskrit ASR for building ASR systems in two other Indic languages, Gujarati and Telugu. For both these languages, our experimental results show that the use of phonetic based graphemic representations in ASR results in performance improvements as compared to ASR systems that use native scripts.

en eess.AS, cs.CL

Detail Sumber

arXiv Open Access 2021

LCP-RIT at SemEval-2021 Task 1: Exploring Linguistic Features for Lexical Complexity Prediction

Abhinandan Desai, Kai North, Marcos Zampieri et al.

This paper describes team LCP-RIT's submission to the SemEval-2021 Task 1: Lexical Complexity Prediction (LCP). The task organizers provided participants with an augmented version of CompLex (Shardlow et al., 2020), an English multi-domain dataset in which words in context were annotated with respect to their complexity using a five point Likert scale. Our system uses logistic regression and a wide range of linguistic features (e.g. psycholinguistic features, n-grams, word frequency, POS tags) to predict the complexity of single words in this dataset. We analyze the impact of different linguistic features in the classification performance and we evaluate the results in terms of mean absolute error, mean squared error, Pearson correlation, and Spearman correlation.

en cs.CL

Detail Sumber

arXiv Open Access 2021

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Changyao Tian, Wenhai Wang, Xizhou Zhu et al.

Deep learning-based models encounter challenges when processing long-tailed data in the real world. Existing solutions usually employ some balancing strategies or transfer learning to deal with the class imbalance problem, based on the image modality. In this work, we present a visual-linguistic long-tailed recognition framework, termed VL-LTR, and conduct empirical studies on the benefits of introducing text modality for long-tailed recognition (LTR). Compared to existing approaches, the proposed VL-LTR has the following merits. (1) Our method can not only learn visual representation from images but also learn corresponding linguistic representation from noisy class-level text descriptions collected from the Internet; (2) Our method can effectively use the learned visual-linguistic representation to improve the visual recognition performance, especially for classes with fewer image samples. We also conduct extensive experiments and set the new state-of-the-art performance on widely-used LTR benchmarks. Notably, our method achieves 77.2% overall accuracy on ImageNet-LT, which significantly outperforms the previous best method by over 17 points, and is close to the prevailing performance training on the full ImageNet. Code is available at https://github.com/ChangyaoTian/VL-LTR.

en cs.CV

Detail Sumber

Hasil untuk "Philology. Linguistics"