Hasil untuk "Greek philology and language"

Menampilkan 20 dari ~1458205 hasil · dari DOAJ, CrossRef, arXiv, Semantic Scholar

JSON API
arXiv Open Access 2026
Improving Variable-Length Generation in Diffusion Language Models via Length Regularization

Zicong Cheng, Ruixuan Jia, Jia Li et al.

Diffusion Large Language Models (DLLMs) are inherently ill-suited for variable-length generation, as their inference is defined on a fixed-length canvas and implicitly assumes a known target length. When the length is unknown, as in realistic completion and infilling, naively comparing confidence across mask lengths becomes systematically biased, leading to under-generation or redundant continuations. In this paper, we show that this failure arises from an intrinsic lengthinduced bias in generation confidence estimates, leaving existing DLLMs without a robust way to determine generation length and making variablelength inference unreliable. To address this issue, we propose LR-DLLM, a length-regularized inference framework for DLLMs that treats generation length as an explicit variable and achieves reliable length determination at inference time. It decouples semantic compatibility from lengthinduced uncertainty through an explicit length regularization that corrects biased confidence estimates. Based on this, LR-DLLM enables dynamic expansion or contraction of the generation span without modifying the underlying DLLM or its training procedure. Experiments show that LRDLLM achieves 51.3% Pass@1 on HumanEvalInfilling under fully unknown lengths (+13.4% vs. DreamOn) and 51.5% average Pass@1 on four-language McEval (+14.3% vs. DreamOn).

en cs.CL, cs.LG
arXiv Open Access 2026
Introducing MELI: the Mandarin-English Language Interview Corpus

Suyuan Liu, Molly Babel

We introduce the Mandarin-English Language Interview (MELI) Corpus, an open-source resource of 29.8 hours of speech from 51 Mandarin-English bilingual speakers. MELI combines matched sessions in Mandarin and English with two speaking styles: read sentences and spontaneous interviews about language varieties, standardness, and learning experiences. Audio was recorded at 44.1 kHz (16-bit, stereo). Interviews were fully transcribed, force-aligned at word and phone levels, and anonymized. Descriptively, the Mandarin component totals ~14.7 hours (mean duration 17.3 minutes) and the English component ~15.1 hours (mean duration 17.8 minutes). We report token/type statistics for each language and document code-switching patterns (frequent in Mandarin sessions; more limited in English sessions). The corpus design supports within-/cross-speaker, within/cross-language acoustic comparison and links acoustics to speakers' stated language attitudes, enabling both quantitative and qualitative analyses. The MELI Corpus will be released with transcriptions, alignments, metadata, scans of labelled maps and documentation under a CC BY-NC 4.0 license.

en cs.CL
S2 Open Access 2026
Comparative etymological analysis of naming traditions in English and Azerbaijani

Gunay Babazade

This article provides a comparative etymological analysis of naming traditions in English and Azerbaijani, focusing on the linguistic, historical, and cultural foundations of personal names. Personal names serve not only as identifiers but also as important cultural and social indicators that reflect a society’s values, beliefs, and historical experience. Through the study of naming traditions, it is possible to trace linguistic contacts, cultural influences, and changes in social structure over time. The research examines the etymological origins and semantic characteristics of personal names in both languages. English naming traditions are largely influenced by Germanic, Latin, Greek, and biblical sources, which entered the language through historical events such as Christianization and the Norman Conquest. Azerbaijani naming traditions, on the other hand, have been shaped mainly by Turkic roots, as well as strong Arabic and Persian influences due to religious, literary, and cultural interactions. The article compares the structural features and meanings of names in both languages and identifies similarities and differences in their formation and usage. The findings indicate that despite linguistic and historical differences, English and Azerbaijani naming traditions fulfill similar social and cultural functions. This research contributes to the fields of comparative linguistics and onomastics and may serve as a useful resource for students studying linguistics, philology, and intercultural communication.

S2 Open Access 2025
Spatial Generation in Indo-European Languages and Culture

Maxim Shchegolev

The article examines the connections between space, language and cognition in the Indo-European linguistic and cultural sphere. The relevance of the research topic is due to the growing recognition of the interdisciplinary nature of humanitarian research and the ongoing debate about the relationship between language and cognition, including in relation to spatial categories. In order to deepen the understanding of the bidirectional relationship between the environment and cognition, the concept of ‘spatial generation’ is introduced – an interdisciplinary understanding of the signification functioning. The subject of study is archaic texts of the Indo-European culture. The interdisciplinary approach includes the following combination of methods: etymological, structural, comparative conceptual text analysis, philological text analysis, semiotic. Materials are the sources of ancient Indian, ancient Iranian, ancient Germanic, ancient Greek and Anatolian traditions served as materials. The research focuses on Indo-European cultures, emphasizing their consistent spatial conceptualization: anthropocentric organization, binary oppositions and direction archetypes. In particular, gender spatial concepts are considered, while male terms are associated with power and activity and female terms with passivity and fertility. These concepts can be considered not just as ideas, but as fixed in language and cognition, strengthening social hierarchies and power structures. The concept of spatial generation is tested on archaic texts of Indo-European languages. A distinction is made between intradirectional generation, when the physical world directly shapes cognition and language, and extradirectional generation, when human cognition and culture are projected onto the world and change it. The article emphasizes the ability of the concept to integrate various disciplines, from philology and archaeology to psychology and cognitive science, which allows for a more holistic understanding of cultural phenomena. It demonstrates how spatial generation improves the interpretation of texts, artifacts and rituals, emphasizing the role of ecological context and iconic semiosis. Pointing to the crucial role of space in shaping human experience, the concept of spatial generation offers a tool for future research, innovation and the pursuit of a more comprehensive understanding of the world and a human in it.

S2 Open Access 2025
Semantic increments of colourative terminological elements in modern medical terminology

A.A. Gorzhaya

The aim of the article is to determine the semantic specificity of terminological units and their individual terminological elements, which have a direct or indirect colour indication or its absence. Colourative terminological elements denote such a specific property of medicinal objects, processes, or phenomena as colour (or lack thereof), and can also have additional connotations in the semantics of colour. In the process of material selection it has been revealed that a quite large number of medical terminological units in the modern Russian language are characterized by the presence of colourative semantics, i.e. the seme of colour designation. The article examines the semantic increments of the terminological elements-colouratives in medical terminology in the modern Russian language. 315 Russian-language medical terminological units with terminological elements-colouratives and more than 350 diverse contexts of their use have served as the material for the study. Research methods include continuous selection, context and content analysis, semantic analysis, descriptive method, as well as methods of quantitative calculation. In the course of the study, it has been determined that among the predominant colours used in the nomination of medical objects, processes or phenomena, the following are distinguished: red, white and black. To a lesser extent, yellow, dark blue / light blue, grey and green colours are represented within the selected terminology units, as well as an indication of the presence of several colors or their total absence. The abovementioned colours are verbalized by the traditional Russian-language and borrowed from Latin and Greek languages terminological elements-colouratives. Medical terms with colourative elements expressing other colours are presented in a much smaller quantity. In the course of the work, cases of both direct designation of the colour of a certain medical object or phenomenon, and its indirect, figurative, metaphorical and metonymic expression have been identified. The use of colourative terminology elements in the modern Russian medical terminology plays an important role both in diagnostic terms and in terms of communication between medical specialists.

1 sitasi en
arXiv Open Access 2025
Model Merging to Maintain Language-Only Performance in Developmentally Plausible Multimodal Models

Ece Takmaz, Lisa Bylinina, Jakub Dotlacil

State-of-the-art vision-and-language models consist of many parameters and learn from enormous datasets, surpassing the amounts of linguistic data that children are exposed to as they acquire a language. This paper presents our approach to the multimodal track of the BabyLM challenge addressing this discrepancy. We develop language-only and multimodal models in low-resource settings using developmentally plausible datasets, with our multimodal models outperforming previous BabyLM baselines. One finding in the multimodal language model literature is that these models tend to underperform in \textit{language-only} tasks. Therefore, we focus on maintaining language-only abilities in multimodal models. To this end, we experiment with \textit{model merging}, where we fuse the parameters of multimodal models with those of language-only models using weighted linear interpolation. Our results corroborate the findings that multimodal models underperform in language-only benchmarks that focus on grammar, and model merging with text-only models can help alleviate this problem to some extent, while maintaining multimodal performance.

en cs.CL, cs.CV
arXiv Open Access 2025
Untangling the Influence of Typology, Data and Model Architecture on Ranking Transfer Languages for Cross-Lingual POS Tagging

Enora Rice, Ali Marashian, Hannah Haynie et al.

Cross-lingual transfer learning is an invaluable tool for overcoming data scarcity, yet selecting a suitable transfer language remains a challenge. The precise roles of linguistic typology, training data, and model architecture in transfer language choice are not fully understood. We take a holistic approach, examining how both dataset-specific and fine-grained typological features influence transfer language selection for part-of-speech tagging, considering two different sources for morphosyntactic features. While previous work examines these dynamics in the context of bilingual biLSTMS, we extend our analysis to a more modern transfer learning pipeline: zero-shot prediction with pretrained multilingual models. We train a series of transfer language ranking systems and examine how different feature inputs influence ranker performance across architectures. Word overlap, type-token ratio, and genealogical distance emerge as top features across all architectures. Our findings reveal that a combination of typological and dataset-dependent features leads to the best rankings, and that good performance can be obtained with either feature group on its own.

en cs.CL
S2 Open Access 2025
Восьмая международная научная конференция по эллинистике памяти И. И. Ковалевой

К.А. Климова

В обзоре представлена информация о Восьмой международной научной конференции по эллинистике памяти И. И. Ковалевой, прошедшей 16–17 апреля 2025 г. на кафедре византийской и новогреческой филологии филологического факультета МГУ имени М. В. Ломоносова. В конференции приняли участие 60 специалистов из разных городов России (Москвы, Санкт-Петербурга, Петрозаводска, Краснодара, Пятигорска, Ростова-на-Дону), а также из Греции, Сербии, Турции и Китая с докладами, посвященными классической филологии, византийской филологии, истории, культуре, этнолингвистике и фольклору, истории греческого языка, сравнительному языкознанию, литературе, переводоведению и рецепции античности в современном мире. This review provides an overview of the Eighth International Scientific Conference on Hellenistics in memory of I.I. Kovaleva, held on 16–17 April 2025 at the Department of Byzantine and New Greek Philology, Faculty of Philology, Lomonosov Moscow State University. The conference was attended by 60 specialists from various cities in Russia (Moscow, Saint Petersburg, Petrozavodsk, Krasnodar, Pyatigorsk, Rostov-on-Don), as well as from Greece, Serbia, Turkey and China, who gave presentations on topics dedicated to Classical philology, Byzantine philology, History, Culture, Ethnolinguistics and Folklore, the History of the Greek language, Comparative linguistics, Literature, Translation studies, and the Reception of Antiquity in the modern world.

S2 Open Access 2025
Lost in the Balkans: Differential Place Marking in the Aromanian Varieties

Olivier Winistörfer, Anastasia Escher, Daria Konior

The phenomenon of Differential Place Marking (Haspelmath 2019), also called zero-marking of spatial relations (Stolz et al 2014), has often been mentioned in the languages of the Balkans. Examples of such differential marking have been documented in the Aromanian varieties (Kramer 1981; Caragiu-Marioțeanu 1975), Modern Greek (Holton et al 1997), Macedonian (Koneski 1965), Ancient Greek (Luraghi 2017), and Latin (Haspelmath 2019; Kramer 1981). However, while the presence of Differential Place Marking has been widely acknowledged, detailed descriptions of such patterns in different varieties are still lacking. Our aim is to present and discuss linguistic data from Aromanian and other Balkan Romance varieties (Istroromanian and Meglen Vlach) to better understand the inter- and intra-dialectal variation of Differential Place Marking. We study and compare their occurrences in the linguistic transcripts from different synchronic Aromanian varieties: from Kruševo (Gołąb 1984), Ohrid and Struga (Markoviḱ 2007), and Turia/Kranéa (Bara et al 2005). The results of the comparative analysis suggest that the dialectal and diachronic picture is not uniform. Various semantic factors, such as the type of noun indicating location (proper vs. common) and whether the location is perceived as proximal or distant seem to play a key role.

arXiv Open Access 2024
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions

Akash Ghosh, Arkadeep Acharya, Sriparna Saha et al.

The advent of Large Language Models (LLMs) has significantly reshaped the trajectory of the AI revolution. Nevertheless, these LLMs exhibit a notable limitation, as they are primarily adept at processing textual information. To address this constraint, researchers have endeavored to integrate visual capabilities with LLMs, resulting in the emergence of Vision-Language Models (VLMs). These advanced models are instrumental in tackling more intricate tasks such as image captioning and visual question answering. In our comprehensive survey paper, we delve into the key advancements within the realm of VLMs. Our classification organizes VLMs into three distinct categories: models dedicated to vision-language understanding, models that process multimodal inputs to generate unimodal (textual) outputs and models that both accept and produce multimodal inputs and outputs.This classification is based on their respective capabilities and functionalities in processing and generating various modalities of data.We meticulously dissect each model, offering an extensive analysis of its foundational architecture, training data sources, as well as its strengths and limitations wherever possible, providing readers with a comprehensive understanding of its essential components. We also analyzed the performance of VLMs in various benchmark datasets. By doing so, we aim to offer a nuanced understanding of the diverse landscape of VLMs. Additionally, we underscore potential avenues for future research in this dynamic domain, anticipating further breakthroughs and advancements.

en cs.CV, cs.AI
arXiv Open Access 2024
Do Membership Inference Attacks Work on Large Language Models?

Michael Duan, Anshuman Suri, Niloofar Mireshghallah et al.

Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a member of a target model's training data. Despite extensive research on traditional machine learning models, there has been limited work studying MIA on the pre-training data of large language models (LLMs). We perform a large-scale evaluation of MIAs over a suite of language models (LMs) trained on the Pile, ranging from 160M to 12B parameters. We find that MIAs barely outperform random guessing for most settings across varying LLM sizes and domains. Our further analyses reveal that this poor performance can be attributed to (1) the combination of a large dataset and few training iterations, and (2) an inherently fuzzy boundary between members and non-members. We identify specific settings where LLMs have been shown to be vulnerable to membership inference and show that the apparent success in such settings can be attributed to a distribution shift, such as when members and non-members are drawn from the seemingly identical domain but with different temporal ranges. We release our code and data as a unified benchmark package that includes all existing MIAs, supporting future work.

en cs.CL
arXiv Open Access 2024
From N-grams to Pre-trained Multilingual Models For Language Identification

Thapelo Sindane, Vukosi Marivate

In this paper, we investigate the use of N-gram models and Large Pre-trained Multilingual models for Language Identification (LID) across 11 South African languages. For N-gram models, this study shows that effective data size selection remains crucial for establishing effective frequency distributions of the target languages, that efficiently model each language, thus, improving language ranking. For pre-trained multilingual models, we conduct extensive experiments covering a diverse set of massively pre-trained multilingual (PLM) models -- mBERT, RemBERT, XLM-r, and Afri-centric multilingual models -- AfriBERTa, Afro-XLMr, AfroLM, and Serengeti. We further compare these models with available large-scale Language Identification tools: Compact Language Detector v3 (CLD V3), AfroLID, GlotLID, and OpenLID to highlight the importance of focused-based LID. From these, we show that Serengeti is a superior model across models: N-grams to Transformers on average. Moreover, we propose a lightweight BERT-based LID model (za_BERT_lid) trained with NHCLT + Vukzenzele corpus, which performs on par with our best-performing Afri-centric models.

en cs.CL, cs.AI
arXiv Open Access 2024
Behavioral Bias of Vision-Language Models: A Behavioral Finance View

Yuhang Xiao, Yudi Lin, Ming-Chang Chiu

Large Vision-Language Models (LVLMs) evolve rapidly as Large Language Models (LLMs) was equipped with vision modules to create more human-like models. However, we should carefully evaluate their applications in different domains, as they may possess undesired biases. Our work studies the potential behavioral biases of LVLMs from a behavioral finance perspective, an interdisciplinary subject that jointly considers finance and psychology. We propose an end-to-end framework, from data collection to new evaluation metrics, to assess LVLMs' reasoning capabilities and the dynamic behaviors manifested in two established human financial behavioral biases: recency bias and authority bias. Our evaluations find that recent open-source LVLMs such as LLaVA-NeXT, MobileVLM-V2, Mini-Gemini, MiniCPM-Llama3-V 2.5 and Phi-3-vision-128k suffer significantly from these two biases, while the proprietary model GPT-4o is negligibly impacted. Our observations highlight directions in which open-source models can improve. The code is available at https://github.com/mydcxiao/vlm_behavioral_fin.

en cs.CL, cs.AI
S2 Open Access 2024
Ὡς ὁ βασιλεὺς ἄγει. Considerazioni sulle formule di datazione doppia delle iscrizioni greche della Babilonia partica

F. Pompeo

This paper examines two Greek inscriptions written in Babylon during the Parthian period with specific focus on the sections containing double dating formulas. Unfortunately, both inscriptions – particularly the first – are damaged. This paper shows that strong evidence in favor of one of the text reconstructions proposed by scholars can be obtained by means of an analysis combiningthe methods and tools of philology and historical (socio)linguistics. Evidence from other languages is also considered.

S2 Open Access 2024
Ancient Epic Poetry

Jörg Rüpke, Sofia Bianchi Mancini

Epics are the oldest written long texts in many languages. They claim to recount fundamental events and seek to give validity to their version through formal composition. This volume provides an overview of Greek and Latin epic poetry from Homer to Late Antiquity. But above all it asks: How were they made audible? Who wanted to read or listen to them? And how did this change the texts? The professionalisation of such great works also led to competition, outbidding, but also to parody or condensation of the texts. This book is the first to provide a broad and coherent overview from such a perspective. Jörg Rüpke was Professor of Classical Philology at the University of Potsdam from 1995 to 1999 and Professor of Comparative Religious Studies at the University of Erfurt from 1999 to 2008. Since 2008 he has been Fellow of Religious Studies at the Max Weber Centre in Erfurt. Sofia Bianchi Mancini studied Classics at the University of Wales Trinity St David and completed her doctorate at the University of Erfurt in 2021. Since 2021 she is a postdoctoral researcher, working on 'Divine Property: Late Antiquity and Medieval Solutions' at the Max Weber Centre in Erfurt.

S2 Open Access 2024
PERIODIZATION OF THE FORMATION OF ENGLISH IN THE CONTEXT OF THE BORROWING PROCESS

L. Vorobiova

Objective. The objective of the article is to identify periodization of the formation of English in the context of the borrowing process; to analyze the historical and lexical and aspects of the borrowings integration in the context of the English language formation. Methods. The main scientific results are obtained using the historical method of theoretical generalization, which makes it possible to determine the nature of borrowings’ periodization; comparative - to compare historical phenomena, events and facts of the socio-cultural life and to establish similarities and differences in the adaptation of the borrowings at different stages of their integration into English. Results. The theoretical analysis of the nature of the borrowings makes it possible to identify the periods of possible periodization that enables effective intercultural studies in the fields of linguistics, philology, terminology. Interpretation and analysis of the genesis of possible periods will lead to successful management of the educational process for philology and history students. Four periods are identified. The first period is characterized with the migration of Germanic tribes of Angles, Saxons, and Jutes to Britain begins. During this time, there was a strong influence of Latin and Scandinavian languages due to the conquests. The second period is marked by significant changes in grammar, simplification of cases and the disappearance of many endings. The third period can be marked by the Great Vowel Shift, a massive phonetic restructuring of English sounds. This period is associated with the growing influence of the Renaissance and printing, which contributed to the standardization of the English language. The fourth period is characterized by the stabilization of the language structure and its vocabulary. In addition to the process of borrowing, modern English is characterised by another wave of vocabulary enrichment caused by three main factors: the unprecedented growth of scientific vocabulary and the emergence of the American version of English. References: Verba, L. (2006). Istoriia anhliiskoi movy [History of the English language]. Vinnytsya, Nova Knyha Publ., 296 p. Amiot, (2004). Haut degré et préfixation. In F. Lefeuvre & M. Noailly (eds.), Intensité, Comparaison, Degré. Travaux linguistiques du Cerlico, no. 17, pp. 91‒104 Anttila, R. (1989). Historical and Comparative Linguistics. Amsterdam, ohn Benjamins , 370 p. Baugh, (1978). History of the English language. Th. Cable. London, Pearson Education Publ., 398 p. Berndt, (1989). A history of the English language. Leipzig, Verlag Enzyklopedie Publ., 240 p. Bortone, (2010). Greek prepositions: from antiquity to the present. Oxford, OxfordUniversity Press Publ., 380 p. Crystal, (2004). The Stories of English. London, Penguin Publ., 400 p. Hoffer, L. (2005). Language Borrowing and the Indices of Adaptability and Receptivity. Intercultural Communication Studies, no. XIV: 2, pp. 53‒72 Hoffer, L. (2002). Language Borrowing and Language Diffusion: an Overview. Intercultural Communication Studies, no. XI-2, pp. 1‒36 Haugen, (1950). The analysis of linguistic borrowing. Language. no. 26.2, pp. 211‒ 231 Jespersen, (1946). Growth and Structure of the English Language. New York, Doubleday & Anchor Publ., 376 p. Matras, , & Sakel J. (2007). Introduction. Grammatical Borrowing in Cross-Linguistic Perspective. Berlin & New York, Mouton de Gruyter Publ., 220 p. Oxford English dictionary. Available at: https://www.oed.com

Halaman 11 dari 72911