Hasil "Philology. Linguistics"

S2 Open Access 1988

M. Bakhtin, M. Holquist, Caryl Emerson et al.

3356 sitasi en Sociology, History

arXiv Open Access 2026

Subword-Based Comparative Linguistics across 242 Languages Using Wikipedia Glottosets

Iaroslav Chelombitko, Mika Hämäläinen, Aleksey Komissarov

We present a large-scale comparative study of 242 Latin and Cyrillic-script languages using subword-based methodologies. By constructing 'glottosets' from Wikipedia lexicons, we introduce a framework for simultaneous cross-linguistic comparison via Byte-Pair Encoding (BPE). Our approach utilizes rank-based subword vectors to analyze vocabulary overlap, lexical divergence, and language similarity at scale. Evaluations demonstrate that BPE segmentation aligns with morpheme boundaries 95% better than random baseline across 15 languages (F1 = 0.34 vs 0.15). BPE vocabulary similarity correlates significantly with genetic language relatedness (Mantel r = 0.329, p < 0.001), with Romance languages forming the tightest cluster (mean distance 0.51) and cross-family pairs showing clear separation (0.82). Analysis of 26,939 cross-linguistic homographs reveals that 48.7% receive different segmentations across related languages, with variation correlating to phylogenetic distance. Our results provide quantitative macro-linguistic insights into lexical patterns across typologically diverse languages within a unified analytical framework.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2026

Linguistic Signatures for Enhanced Emotion Detection

Florian Lecourt, Madalina Croitoru, Konstantin Todorov

Emotion detection is a central problem in NLP, with recent progress driven by transformer-based models trained on established datasets. However, little is known about the linguistic regularities that characterize how emotions are expressed across different corpora and labels. This study examines whether linguistic features can serve as reliable interpretable signals for emotion recognition in text. We extract emotion-specific linguistic signatures from 13 English datasets and evaluate how incorporating these features into transformer models impacts performance. Our RoBERTa-based models enriched with high level linguistic features achieve consistent performance gains of up to +2.4 macro F1 on the GoEmotions benchmark, showing that explicit lexical cues can complement neural representations and improve robustness in predicting emotion categories.

en cs.CL

Detail Sumber

arXiv Open Access 2026

\textit{Versteasch du mi?} Computational and Socio-Linguistic Perspectives on GenAI, LLMs, and Non-Standard Language

Verena Platzgummer, John McCrae, Sina Ahmadi

The design of Large Language Models and generative artificial intelligence has been shown to be "unfair" to less-spoken languages and to deepen the digital language divide. Critical sociolinguistic work has also argued that these technologies are not only made possible by prior socio-historical processes of linguistic standardisation, often grounded in European nationalist and colonial projects, but also exacerbate epistemologies of language as "monolithic, monolingual, syntactically standardized systems of meaning". In our paper, we draw on earlier work on the intersections of technology and language policy and bring our respective expertise in critical sociolinguistics and computational linguistics to bear on an interrogation of these arguments. We take two different complexes of non-standard linguistic varieties in our respective repertoires--South Tyrolean dialects, which are widely used in informal communication in South Tyrol, Italy, as well as varieties of Kurdish--as starting points to an interdisciplinary exploration of the intersections between GenAI and linguistic variation and standardisation. We discuss both how LLMs can be made to deal with nonstandard language from a technical perspective, and whether, when or how this can contribute to "democratic and decolonial digital and machine learning strategies", which has direct policy implications.

en cs.CL

Detail Sumber

S2 Open Access 2025

The Making of the Humanities

R. Bod, J. Maat, T. Weststeijn

Introduction: The Dawn of the Modern Humanities[-]Rens Bod[-][-]Part I. Linguistics and Philology[-][-]1. The Rise of Philology: The Comparative Method, the Historicist Turn and the Surreptitious Influence of Giambattista Vico[-]Joep Leerssen[-][-]2. Linguistics ante litteram: Compiling and Transmitting Views on Language Diversity and Relatedness before the Nineteenth Century[-]Toon van Hal[-][-]3. The Rise of General Linguistics as an Academic Discipline: Georg von der Gabelentz as a Co-Founder[-]Els Elffers[-][-]Part II. The Humanities and the Sciences[-][-]4. The Mutual Making of Sciences and Humanities: Willebrord Snellius, Jacob Golius, and the Early Modern Entanglement of Mathematics and Philology[-]Fokko Jan Dijksterhuis[-][-]5. A 'Human' Science: Hawkins's Science of Music[-]Maria Semi[-][-]6. Bopp the Builder. Discipline Formation as Hybridization: The Case of Comparative Linguistics[-]Bart Karstens[-][-]Part III. Writing History and Intellectual History[-][-]7. Nineteenth-Century Historicism and its Predecessors: Historical Experience, Historical Ontology and Historical Method[-]Jacques Bos[-][-]9. Fact and Fancy in Nineteenth-Century Historiography and Fiction: The Case of Macaulay and Roidis[-]Foteini Lika[-][-]8. The Humanities as the Stronghold of Freedom: John Milton's Areopagitica and John Stuart Mill's On Liberty[-]Hilary Gatti[-][-]Part IV. The Impact of the East[-][-]10. The Impact on the European Humanities of Early Reports from Catholic Missionaries from China, Tibet and Japan between 1600 and 1700[-]Gerhard F. Strasser[-][-]11. The Middle Kingdom in the Low Countries: Sinology in the Early Modern Netherlands[-]Thijs Weststeijn[-][-]12. The Oriental Origins of Orientalism: The Case of Dimitrie Cantemir[-]Michiel Leezenberg[-][-]Part V. Artworks and Texts[-][-]13. The Role of Emotions in the Development of Artistic Theory and the System of Literary Genres[-]Mats Malm[-][-]14. Philology and the History of Art[-]Adi Efal[-][-]Part VI. Literature and Rhetoric[-][-]15. Bourgeois versus Aristocratic Models of Scholarship: Medieval Studies at the Acad mie des Inscriptions, 1701-1751[-]Alicia C. Montoya[-][-]16. Ancients, Moderns and the Gothic in Eighteenth-Century Historiography[-]Neus Rotger[-][-]17. The Afterlife of Rhetoric in Hobbes, Vico and Nietzsche[-]David L. Marshall[-][-]Part VII. Academic Communities[-][-]18. The Documents of Feith: The Centralization of the Archive in Nineteenth-Century Historiography[-]Pieter Huistra[-][-]19. Humboldt in Copenhagen: Discipline Formation in the Humanities at the University of Copenhagen in the Nineteenth Century[-]Claus M ller J rgensen[-][-]20. The Scholarly Self: Ideals of Intellectual Virtue in Nineteenth-Century Leiden[-]Herman Paul[-][-]List of Contributors[-][-]List of Illustrations[-][-]Index[-][-]

10 sitasi en History

Detail DOI Sumber

S2 Open Access 2025

А.П. Дульзон – личность, ученый, наставник: к 125-летию со дня рождения

Анна Анатольевна Богданова, Александра Аркадьевна Ким

В статье рассматривается многогранное научное наследие Андрея Петровича Дульзона (1900–1973), доктора филологических наук, профессора, выдающегося российского лингвиста, этнографа и археолога, основателя Томской школы полевой лингвистики. Анализируется его вклад в изучение истории, языков и культур народов Сибири, с особым акцентом на взаимосвязи между лингвистическими, археологическими и этнографическими данными. Подчеркивается значение его экспедиционной деятельности, созданной им научной школы и уникального архивного фонда для современных научных исследований. The article examines the multifaceted scientific legacy of Andrei Petrovich Dulzon (1900–1973), Doctor of Philology, professor, outstanding linguist, ethnographer and archaeologist, founder of the Tomsk school of field linguistics. His contribution to the study of the history, languages and cultures of the peoples of Siberia is analyzed, with a special emphasis on the relationship between linguistic, archaeological and ethnographic data. The importance of his expeditionary activities, the scientific school he created and the unique archival fund for modern scientific research is emphasized.

1 sitasi en

Detail DOI Sumber

DOAJ Open Access 2025

Le formule magiche medio inglesi del XV secolo tra convenzionalità e innovazione

Donata Bulotta

The precariousness of the health situation in England from the 14th century onwards led to the use of any curative means, whether scientific, religious or ritual-magical. In this context, healing charms were seen as accessible and practicable methods. They were often added to medical prescriptions and herbal remedies in medical or pseudo-pharmacological compilations, as they were considered an alternative form of therapy equally valid in the treatment of ailments. Many charms created during this period were a mixture of magic, religion and folklore, but some received new cultural stimulus, by incorporating original elements and symbolism from Arabic, Greek and Hebrew magical texts introduced to the island. This work will focus on a selection of 15th century healing charms. The analysis aims to demonstrate that the principles of the new occult and esoteric doctrines, circulating in the intellectual and cultural centers of the island, influenced the magical healing ritual. The study of pseudo-Solomonic texts, although strongly censored by the Church, however contributed to the creation of new textual amulets, which were used in addition to the pre-existing charms so becoming a further alternative medium in the therapeutic procedure.

German literature, Philology. Linguistics

Detail Sumber

DOAJ Open Access 2025

A pragmatic analysis of deictic expressions used in the IELTS speaking test

Fadi Al-Khasawneh

This study investigates the role of deictic expressions in the IELTS speaking test, addressing a gap in research on how test-takers across proficiency levels use deixis in spoken language assessment. While previous studies have examined general discourse features in language testing, little attention has been given to the frequency, functions, and distribution of deixis in assessing spoken proficiency. The study analysed a corpus of 30 IELTS speaking test transcripts, covering proficiency levels from low-intermediate to advanced. Using Levinson’s classification of deixis, the study employed quantitative frequency analysis and qualitative discourse analysis to examine variations in the use of personal, temporal, and spatial deixis. The findings revealed that personal deixis was the most frequently used, followed by temporal and spatial deixis. However, the results of One-Way ANOVA test showed no significant differences in deixis usage across proficiency levels. These findings contribute to English language teaching and assessment by highlighting how deixis functions in test-taker discourse, offering insights for IELTS preparation and speaking proficiency evaluation. The study indicates the need for further exploration of discourse features in language assessment.

Education, Philology. Linguistics

Detail DOI Sumber

DOAJ Open Access 2025

Formation Resources of the English Terminology of Inclusive Education

Alina Dushkevych

The article is devoted to a comprehensive analysis of the resources of forming the English terminological system of inclusive education in the modern educational environment. The role of terminology as a tool for standardizing knowledge, communication and scientific understanding of inclusion problems is considered. It is shown that the development of inclusive education requires a clear delineation of the terminological apparatus, since it is the terms that ensure accuracy in defining concepts, unambiguousness in use and unity in the interpretation of international and national educational documents. The formation of the English-language terminological system is based on international regulatory acts, such as the "Convention on the Rights of Persons with Disabilities", "Salamanca Statement and Framework for Action on Special Needs Education", as well as numerous legislative acts of the USA (in particular the "Individuals with Disabilities Education Act" - IDEA). An important role in this process is played by glossaries, encyclopedias and textbooks on pedagogy, psychology and special education, which systematize, unify and disseminate professional vocabulary. Particular attention is paid to the analysis of key concepts of English-language inclusive education: "inclusive education", "special educational needs", "learning disabilities", "barrier-free environment", "universal design for learning", "accessibility" and their Ukrainian counterparts. It is emphasized that when translating and adapting terms, it is necessary to take into account not only the lexical-semantic aspect, but also the cultural-pedagogical context in order to avoid shifting meanings. The terminological base of inclusive education performs a number of functions: cognitive (ensuring the scientific validity of concepts), communicative (unification of interdisciplinary and intercultural communication), normative (consolidating standards in legislation and educational policy) and practical (ensuring the effective work of teachers, psychologists, social workers). It is noted that the terms must meet the criteria of accuracy, conciseness, unambiguousness and international comprehensibility.

Discourse analysis, Computational linguistics. Natural language processing

Detail DOI Sumber

CrossRef Open Access 2025

Historical Linguistics and Philology

Georgios K. Giannakis

en

Detail DOI Sumber

arXiv Open Access 2025

MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs

Yufei Gao, Jiaying Fei, Nuo Chen et al.

Multimodal Large Language Models (MLLMs) have shown remarkable performance in high-resource languages. However, their effectiveness diminishes significantly in the contexts of low-resource languages. Current multilingual enhancement methods are often limited to text modality or rely solely on machine translation. While such approaches help models acquire basic linguistic capabilities and produce "thin descriptions", they neglect the importance of multimodal informativeness and cultural groundedness, both of which are crucial for serving low-resource language users effectively. To bridge this gap, in this study, we identify two significant objectives for a truly effective MLLM in low-resource language settings, namely 1) linguistic capability and 2) cultural groundedness, placing special emphasis on cultural awareness. To achieve these dual objectives, we propose a dual-source strategy that guides the collection of data tailored to each goal, sourcing native web alt-text for culture and MLLM-generated captions for linguistics. As a concrete implementation, we introduce MELLA, a multimodal, multilingual dataset. Experiment results show that after fine-tuning on MELLA, there is a general performance improvement for the eight languages on various MLLM backbones, with models producing "thick descriptions". We verify that the performance gains are from both cultural knowledge enhancement and linguistic capability enhancement. Our dataset can be found at https://opendatalab.com/applyMultilingualCorpus.

en cs.CV, cs.AI

Detail Sumber

S2 Open Access 2025

A Comparative Reading of Pāṇinian Grammatical Tradition and Modern Science of Speech

The modern history of Western linguistics began with comparative philology and coincided with the colonisation of the East for a long time.The colonisation as a process not only involved an interplay of power, dominance and state, it was also a conquest of knowledge. Colonies such as India had a vast rubric of ancient knowledge and especially excelled in linguistics and philology. This paper is an attempt to showcase how the roots of various phonetic and phonological theories that defined and dominated modern linguistics were linked to the ancient Indian grammatical tradition. Scholars from Pāṇinian School of Grammar, such as Pāṇini, Kātyāyana, Patañjali, and Bhartṛhari, have explained a range of speech phenomena to which modern phonetics and phonology correspond significantly. This paper analyses the common grounds between prominent schools of Western phonology and their Indian counterparts and thus highlights a significant theoretical overlap between the knowledge offered by the Western linguistic schools and what was explained several centuries back by prominent Indian grammarians. From the linking of sounds to the psychological reality of a phoneme, the vast canvas of the Indian linguistic tradition could be verifiably seen as a precursor to the most of the structural turn in the twentieth century. Finally, the paper attempts to show the precedence of various recent concepts and theories, such as ‘distinctive feature theory’ or ‘generative grammar’ in the texts like Aṣṭādhyāyī and Vākyapdīya.

en

Detail DOI Sumber

S2 Open Access 2024

On Russian national corpus

E. Rakhilina

The article describes the project of Russian National Corpus (RNC) – a powerful reference and information system in Russian language, created by a consortium of institutions belonging to the Russian Academy of Sciences and with the active participation of Russian IT-company Yandex. The history of the Corpus is presented in great detail: the author comments upon its main functionality and the most technologically advanced subcorpora – poetic, parallel, multimedian, providing examples of their use. Special attention is paid to the latest developments which allow us to introduce modern AI technologies in the RNC; this work was supported by a grant from the Ministry of Education and Science of the Russian Federation. One of the most impressive results is the so-called “panchronic corpus”, which encompasses the thousand-year history of the Russian language and provides searching tools within this data array. As of now, RNC is a crucial support for scientific research both in the field of linguistics and philology, as well as for the methodology of teaching Russian as first and second language and in the domain of IT technologies.

14 sitasi en

Detail DOI Sumber

DOAJ Open Access 2024

Examining the escalation of hostility in social media: a comparative analysis of online incivility in China and the United States regarding the Russia–Ukraine war

Li Yanbo, Su Chris Chao

This study examines and compares online incivility on China’s Weibo and the U.S.’s X (Twitter) amid the Russia-Ukraine conflict, aiming to unravel how different cultural and geopolitical contexts influence online incivility and identify factors that may influence the occurrence of online incivility in different national contexts.

Communication. Mass media

Detail DOI Sumber

DOAJ Open Access 2024

Developing Artificial Intelligence-Powered Monetary Policy Communication Indicators for Macroeconomic Inquiries in Ghana

Francis Mawuli Abude, Jones Odei-Mensah, Eric Schaling

Central bank communication is a valuable source of information designed to shape the expectations of economic agents within and outside an economy. In particular, the content of Monetary Policy Committees’ press releases and statements reflect the central banks’ view of current and future macroeconomic developments, making them useful for creating high-frequency indicators as alternatives to traditional but slower-to-publish macroeconomic indicators. In this study, Artificial Intelligence (AI)-powered text-mining techniques were employed to create monetary policy communication-based indicators, namely the Monetary Policy Readability Index (MPRI), the Monetary Policy Sentiment Index (MPSI), and the Monetary Policy Uncertainty Index (MPUI), using press releases from the Bank of Ghana's monetary policy committee spanning January 2003 to December 2022. The findings suggest that while readability and sentiments generally declined over the sample period, uncertainty increased, indicating persistent macroeconomic imbalances and vulnerabilities in the domestic economy. The newly developed time series-based indicators demonstrate Granger causal relationships with key macroeconomic variables, affirming their relevance to the central bank, the Ministry of Finance, researchers, investors, and development partners. Notably, the indicators can serve as an early warning system for monitoring and predicting the country's macroeconomic risks, forecasting lagging indicators, assessing the effectiveness of the Bank’s monetary policy communication, and addressing monetary policy inquiries.

Communication. Mass media

Detail DOI Sumber

DOAJ Open Access 2023

D’où viennent les relatives ?

Pierre Le Goffic

Relative clauses (of the type … le livre qui est sur la table…, la lettre que tu as écrite…) appeared in Latin by a shift from the correlative structure quas litteras scripsisti, eae… ‘quelle lettre tu as écrite, elle…’ to litterae quas scripsisti… ‘la lettre que tu as écrite…’: the qu- word quas, whose meaning was to express a variable (quas litteras = a letter x), and which was initially a determiner of N, became pronomina²lized and anaphoric of the N that became its antecedent. This structure underwent a great development in Latin and passed onto French, in spite of the morphosyntactic transformations which occurred in late Latin: the relative pronouns (having an antecedent N) separated from the other qu- words to constitute a heterogeneous paradigm, borrowing from several logical principles: casual opposition qui subject / que direct regime, reinforcement by the adverbs où ‘where’ and dont ‘from where’, Roman invention of lequel, late (17th c.) and partial return of the +/-H distinction after preposition. This heterogeneity does not prevent the extensive use of relative clauses in contemporary French.

Philology. Linguistics

Detail Sumber

DOAJ Open Access 2023

Namen im südlichen Ermland. Beobachtungen zur Tätigkeit von Komisja Ustalania Nazw Miejscowości am Beispiel von Toponymen der Gemeinden Gietrzwałd und Stawiguda

Magdalena Lidia Lobert

In the article, selected names of places in southern Warmia will be discussed and subjected to linguistic analysis. On this basis, the division of these toponyms will be made in terms of the linguistic affiliation of their morphemes, which have their source in Polish, German and Prussian. The history of the activity of the Commission for the Determination of Place Names will also be presented, which after World War II, immediately after the incorporation of Warmia into the Polish state, began intensive work on giving German names their Polish equivalents. The collected material will make it possible to formulate the methods most likely used by the Commission during its work, as well as to show conclusions regarding the impact of national identity and state policy on changes in the naming of places in Warmia.

Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

arXiv Open Access 2023

Three-way Decisions with Evaluative Linguistic Expressions

Stefania Boffa, Davide Ciucci

We propose a linguistic interpretation of three-way decisions, where the regions of acceptance, rejection, and non-commitment are constructed by using the so-called evaluative linguistic expressions, which are expressions of natural language such as small, medium, very short, quite roughly strong, extremely good, etc. Our results highlight new connections between two different research areas: three-way decisions and the theory of evaluative linguistic expressions.

en cs.CL

Detail Sumber

arXiv Open Access 2023

Linguistic Properties of Truthful Response

Bruce W. Lee, Benedict Florance Arockiaraj, Helen Jin

We investigate the phenomenon of an LLM's untruthful response using a large set of 220 handcrafted linguistic features. We focus on GPT-3 models and find that the linguistic profiles of responses are similar across model sizes. That is, how varying-sized LLMs respond to given prompts stays similar on the linguistic properties level. We expand upon this finding by training support vector machines that rely only upon the stylistic components of model responses to classify the truthfulness of statements. Though the dataset size limits our current findings, we show the possibility that truthfulness detection is possible without evaluating the content itself. But at the same time, the limited scope of our experiments must be taken into account in interpreting the results.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2023

AI Nushu: An Exploration of Language Emergence in Sisterhood -Through the Lens of Computational Linguistics

Yuqian Sun, Yuying Tang, Ze Gao et al.

This paper presents "AI Nushu," an emerging language system inspired by Nushu (women's scripts), the unique language created and used exclusively by ancient Chinese women who were thought to be illiterate under a patriarchal society. In this interactive installation, two artificial intelligence (AI) agents are trained in the Chinese dictionary and the Nushu corpus. By continually observing their environment and communicating, these agents collaborate towards creating a standard writing system to encode Chinese. It offers an artistic interpretation of the creation of a non-western script from a computational linguistics perspective, integrating AI technology with Chinese cultural heritage and a feminist viewpoint.

en cs.CL, cs.AI

Detail Sumber

Hasil untuk "Philology. Linguistics"