Hasil "Greek philology and language"

arXiv Open Access 2026

CSF: Contrastive Semantic Features for Direct Multilingual Sign Language Generation

Tran Sy Bao

Sign language translation systems typically require English as an intermediary language, creating barriers for non-English speakers in the global deaf community. We present Canonical Semantic Form (CSF), a language-agnostic semantic representation framework that enables direct translation from any source language to sign language without English mediation. CSF decomposes utterances into nine universal semantic slots: event, intent, time, condition, agent, object, location, purpose, and modifier. A key contribution is our comprehensive condition taxonomy comprising 35 condition types across eight semantic categories, enabling nuanced representation of conditional expressions common in everyday communication. We train a lightweight transformer-based extractor (0.74 MB) that achieves 99.03% average slot extraction accuracy across four typologically diverse languages: English, Vietnamese, Japanese, and French. The model demonstrates particularly strong performance on condition classification (99.4% accuracy) despite the 35-class complexity. With inference latency of 3.02ms on CPU, our approach enables real-time sign language generation in browser-based applications. We release our code, trained models, and multilingual dataset to support further research in accessible sign language technology.

en cs.CL

Detail Sumber

arXiv Open Access 2026

Targeted Syntactic Evaluation of Language Models on Georgian Case Alignment

Daniel Gallagher, Gerhard Heyer

This paper evaluates the performance of transformer-based language models on split-ergative case alignment in Georgian, a particularly rare system for assigning grammatical cases to mark argument roles. We focus on subject and object marking determined through various permutations of nominative, ergative, and dative noun forms. A treebank-based approach for the generation of minimal pairs using the Grew query language is implemented. We create a dataset of 370 syntactic tests made up of seven tasks containing 50-70 samples each, where three noun forms are tested in any given sample. Five encoder- and two decoder-only models are evaluated with word- and/or sentence-level accuracy metrics. Regardless of the specific syntactic makeup, models performed worst in assigning the ergative case correctly and strongest in assigning the nominative case correctly. Performance correlated with the overall frequency distribution of the three forms (NOM > DAT > ERG). Though data scarcity is a known issue for low-resource languages, we show that the highly specific role of the ergative along with a lack of available training data likely contributes to poor performance on this case. The dataset is made publicly available and the methodology provides an interesting avenue for future syntactic evaluations of languages where benchmarks are limited.

en cs.CL

Detail Sumber

DOAJ Open Access 2025

Decreto del demo di Halai Aixonides per il sacerdote Polystratos e altri benemeriti

Negro, Silvia

Il decreto, iscritto su una tavola da offerte (trapeza) rinvenuta nel santuario di Apollo Zoster, nel demo attico di Halai Aixonides, si data al 360‑350 e prescrive il conferimento di onori per il sacerdote Polystratos figlio di Charmantides Halaieus e per un gruppo di altri quattro demoti, scelti (hairethentes) per assisterlo nella cura del santuario. Dal documento emergono informazioni sul personale di culto e su alcune pratiche cultuali, tra cui la celebrazione degli Zosteria.

Ancient history, Greek philology and language

Detail DOI Sumber

DOAJ Open Access 2025

Decreto di ripubblicazione di una stele di prossenia distrutta dai Trenta

Fadda Cao, Francesco

La stele restituisce un decreto della boulé ateniese, che impone di ripubblicare la proxenia concessa ai cinque figli di Apemantos, poiché la precedente stele era stata distrutta dai Trenta (404/3 a.C.). La richiesta fu avanzata da uno dei figli, Eurypilos, cui furono attribuite le spese e un invito al Prytaneion. Attraverso il confronto con altre testimonianze epigrafiche relative ad Apemantos e ai suoi figli, si può, inoltre, ricostruire un durevole impegno politico della famiglia nella fazione democratica e filo-ateniese dell'isola di Taso.

Ancient history, Greek philology and language

Detail DOI Sumber

DOAJ Open Access 2025

Bundeswettbewerb Fremdsprachen in Bochum: Latein und Griechisch

Daniel Teubner

Greek language and literature. Latin language and literature, Philology. Linguistics

Detail DOI Sumber

arXiv Open Access 2025

EuroGEST: Investigating gender stereotypes in multilingual language models

Jacqueline Rowe, Mateusz Klimaszewski, Liane Guillou et al.

Large language models increasingly support multiple languages, yet most benchmarks for gender bias remain English-centric. We introduce EuroGEST, a dataset designed to measure gender-stereotypical reasoning in LLMs across English and 29 European languages. EuroGEST builds on an existing expert-informed benchmark covering 16 gender stereotypes, expanded in this work using translation tools, quality estimation metrics, and morphological heuristics. Human evaluations confirm that our data generation method results in high accuracy of both translations and gender labels across languages. We use EuroGEST to evaluate 24 multilingual language models from six model families, demonstrating that the strongest stereotypes in all models across all languages are that women are 'beautiful', 'empathetic' and 'neat' and men are 'leaders', 'strong, tough' and 'professional'. We also show that larger models encode gendered stereotypes more strongly and that instruction finetuning does not consistently reduce gendered stereotypes. Our work highlights the need for more multilingual studies of fairness in LLMs and offers scalable methods and resources to audit gender bias across languages.

en cs.CL

Detail DOI Sumber

DOAJ Open Access 2024

Karl Galinsky (7. 2. 1942 - 9. 3. 2024), in memoriam

Pablo Martínez Astorino

Karl Galinsky (7. 2. 1942 - 9. 3. 2024)

Philology. Linguistics, Greek language and literature. Latin language and literature

Detail DOI Sumber

arXiv Open Access 2024

Misgendering and Assuming Gender in Machine Translation when Working with Low-Resource Languages

Sourojit Ghosh, Srishti Chatterjee

This chapter focuses on gender-related errors in machine translation (MT) in the context of low-resource languages. We begin by explaining what low-resource languages are, examining the inseparable social and computational factors that create such linguistic hierarchies. We demonstrate through a case study of our mother tongue Bengali, a global language spoken by almost 300 million people but still classified as low-resource, how gender is assumed and inferred in translations to and from the high(est)-resource English when no such information is provided in source texts. We discuss the postcolonial and societal impacts of such errors leading to linguistic erasure and representational harms, and conclude by discussing potential solutions towards uplifting languages by providing them more agency in MT conversations.

en cs.CL

Detail Sumber

arXiv Open Access 2024

Specific language impairment (SLI) detection pipeline from transcriptions of spontaneous narratives

Santiago Arena, Antonio Quintero-Rincón

Specific Language Impairment (SLI) is a disorder that affects communication and can affect both comprehension and expression. This study focuses on effectively detecting SLI in children using transcripts of spontaneous narratives from 1063 interviews. A three-stage cascading pipeline was proposed f. In the first stage, feature extraction and dimensionality reduction of the data are performed using the Random Forest (RF) and Spearman correlation methods. In the second stage, the most predictive variables from the first stage are estimated using logistic regression, which is used in the last stage to detect SLI in children from transcripts of spontaneous narratives using a nearest neighbor classifier. The results revealed an accuracy of 97.13% in identifying SLI, highlighting aspects such as the length of the responses, the quality of their utterances, and the complexity of the language. This new approach, framed in natural language processing, offers significant benefits to the field of SLI detection by avoiding complex subjective variables and focusing on quantitative metrics directly related to the child's performance.

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2024

Facilitating large language model Russian adaptation with Learned Embedding Propagation

Mikhail Tikhomirov, Daniil Chernyshev

Rapid advancements of large language model (LLM) technologies led to the introduction of powerful open-source instruction-tuned LLMs that have the same text generation quality as the state-of-the-art counterparts such as GPT-4. While the emergence of such models accelerates the adoption of LLM technologies in sensitive-information environments the authors of such models don not disclose the training data necessary for replication of the results thus making the achievements model-exclusive. Since those open-source models are also multilingual this in turn reduces the benefits of training a language specific LLMs as improved inference computation efficiency becomes the only guaranteed advantage of such costly procedure. More cost-efficient options such as vocabulary extension and subsequent continued pre-training are also inhibited by the lack of access to high-quality instruction-tuning data since it is the major factor behind the resulting LLM task-solving capabilities. To address the limitations and cut the costs of the language adaptation pipeline we propose Learned Embedding Propagation (LEP). Unlike existing approaches our method has lower training data size requirements due to minimal impact on existing LLM knowledge which we reinforce using novel ad-hoc embedding propagation procedure that allows to skip the instruction-tuning step and instead implant the new language knowledge directly into any existing instruct-tuned variant. We evaluated four Russian vocabulary adaptations for LLaMa-3-8B and Mistral-7B, showing that LEP is competitive with traditional instruction-tuning methods, achieving performance comparable to OpenChat 3.5 and LLaMa-3-8B-Instruct, with further improvements via self-calibration and continued tuning enhancing task-solving capabilities.

en cs.CL, cs.AI

Detail Sumber

DOAJ Open Access 2023

Legge sacra da Smirne dal santuario di un’ignota divinità femminile

Sorbello, Francesco

L’iscrizione, rinvenuta a Smirne, è datata tra la fine del II e il I secolo a.C. Presenta una serie di prescrizioni atte a tutelare i beni del santuario di un’anonima dea, in particolare i suoi pesci sacri e il loro allevamento (ichthyotrophion). Sulla base di questi elementi, è possibile ipotizzare che si tratti della siriana Atargatis, del cui culto siamo informati da autori quali Luciano di Samosata ed Eliano, che ricordano il legame della dea con i pesci e menzionano l’esistenza di vivai nei suoi santuari. In età ellenistica, il culto degli dèi siriani si diffonde capillarmente nel mondo greco, importato da comunità di negotiatores siro-fenici che si stabiliscono nelle principali città portuali dell’Egeo, dove fondano, come a Delo, santuari e associazioni di culto.

Ancient history, Greek philology and language

Detail DOI Sumber

arXiv Open Access 2023

Text classification dataset and analysis for Uzbek language

Elmurod Kuriyozov, Ulugbek Salaev, Sanatbek Matlatipov et al.

Text classification is an important task in Natural Language Processing (NLP), where the goal is to categorize text data into predefined classes. In this study, we analyse the dataset creation steps and evaluation techniques of multi-label news categorisation task as part of text classification. We first present a newly obtained dataset for Uzbek text classification, which was collected from 10 different news and press websites and covers 15 categories of news, press and law texts. We also present a comprehensive evaluation of different models, ranging from traditional bag-of-words models to deep learning architectures, on this newly created dataset. Our experiments show that the Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) based models outperform the rule-based models. The best performance is achieved by the BERTbek model, which is a transformer-based BERT model trained on the Uzbek corpus. Our findings provide a good baseline for further research in Uzbek text classification.

en cs.CL

Detail Sumber

DOAJ Open Access 2022

Nachruf auf Hellmut Flashar

Susanne Aretz, Bernard Andreae, Christiane Zimmermann et al.

Greek language and literature. Latin language and literature, Philology. Linguistics

Detail DOI Sumber

DOAJ Open Access 2022

Rez. A. Sirchich von Kis, Catull, Carmina, Göttingen 2021

Jan Janko Jankovi´c

Greek language and literature. Latin language and literature, Philology. Linguistics

Detail DOI Sumber

DOAJ Open Access 2022

Paul Kretschmer's expedition of 1901 to Lesbos and the local network of teachers : the example of the grammar school teacher Michail K. Stefanidis

Stratos Nikolaros

The research of the expedition of Paul Kretschmer (1866–1956) to Lesbos in 1901 brought into focus the person of Michail K. Stefanidis (1868–1957), who was a scientist, professor at the University of Athens and an ordinary member of the Athens Academy. The most important person in the network of Kretschmer in Lesbos, Stefanidis supported the Viennese professor in collecting his linguistic material and evaluating the data. As a result of the expedition, Kretschmer published a monograph about the dialect of Lesbos in 1905. This study seeks to analyse Stefanidis' various activities during this expedition, when he was in the early stages of his career as a grammar school teacher in Mytilini. Previously unknown archive material, such as the travel diaries and historical photographs, were investigated to achieve the goal of the present paper. This valuable material can be found in Kretschmer's estate in the Austrian National Library and the Austrian Academy of Sciences.

History of Greece, Translating and interpreting

Detail DOI Sumber

arXiv Open Access 2022

Measuring Harmful Representations in Scandinavian Language Models

Samia Touileb, Debora Nozza

Scandinavian countries are perceived as role-models when it comes to gender equality. With the advent of pre-trained language models and their widespread usage, we investigate to what extent gender-based harmful and toxic content exist in selected Scandinavian language models. We examine nine models, covering Danish, Swedish, and Norwegian, by manually creating template-based sentences and probing the models for completion. We evaluate the completions using two methods for measuring harmful and toxic completions and provide a thorough analysis of the results. We show that Scandinavian pre-trained language models contain harmful and gender-based stereotypes with similar values across all languages. This finding goes against the general expectations related to gender equality in Scandinavian countries and shows the possible problematic outcomes of using such models in real-world settings.

en cs.CL

Detail Sumber

arXiv Open Access 2022

Languages You Know Influence Those You Learn: Impact of Language Characteristics on Multi-Lingual Text-to-Text Transfer

Benjamin Muller, Deepanshu Gupta, Siddharth Patwardhan et al.

Multi-lingual language models (LM), such as mBERT, XLM-R, mT5, mBART, have been remarkably successful in enabling natural language tasks in low-resource languages through cross-lingual transfer from high-resource ones. In this work, we try to better understand how such models, specifically mT5, transfer *any* linguistic and semantic knowledge across languages, even though no explicit cross-lingual signals are provided during pre-training. Rather, only unannotated texts from each language are presented to the model separately and independently of one another, and the model appears to implicitly learn cross-lingual connections. This raises several questions that motivate our study, such as: Are the cross-lingual connections between every language pair equally strong? What properties of source and target language impact the strength of cross-lingual transfer? Can we quantify the impact of those properties on the cross-lingual transfer? In our investigation, we analyze a pre-trained mT5 to discover the attributes of cross-lingual connections learned by the model. Through a statistical interpretation framework over 90 language pairs across three tasks, we show that transfer performance can be modeled by a few linguistic and data-derived features. These observations enable us to interpret cross-lingual understanding of the mT5 model. Through these observations, one can favorably choose the best source language for a task, and can anticipate its training data demands. A key finding of this work is that similarity of syntax, morphology and phonology are good predictors of cross-lingual transfer, significantly more than just the lexical similarity of languages. For a given language, we are able to predict zero-shot performance, that increases on a logarithmic scale with the number of few-shot target language data points.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2022

MuCoT: Multilingual Contrastive Training for Question-Answering in Low-resource Languages

Gokul Karthik Kumar, Abhishek Singh Gehlot, Sahal Shaji Mullappilly et al.

Accuracy of English-language Question Answering (QA) systems has improved significantly in recent years with the advent of Transformer-based models (e.g., BERT). These models are pre-trained in a self-supervised fashion with a large English text corpus and further fine-tuned with a massive English QA dataset (e.g., SQuAD). However, QA datasets on such a scale are not available for most of the other languages. Multi-lingual BERT-based models (mBERT) are often used to transfer knowledge from high-resource languages to low-resource languages. Since these models are pre-trained with huge text corpora containing multiple languages, they typically learn language-agnostic embeddings for tokens from different languages. However, directly training an mBERT-based QA system for low-resource languages is challenging due to the paucity of training data. In this work, we augment the QA samples of the target language using translation and transliteration into other languages and use the augmented data to fine-tune an mBERT-based QA model, which is already pre-trained in English. Experiments on the Google ChAII dataset show that fine-tuning the mBERT model with translations from the same language family boosts the question-answering performance, whereas the performance degrades in the case of cross-language families. We further show that introducing a contrastive loss between the translated question-context feature pairs during the fine-tuning process, prevents such degradation with cross-lingual family translations and leads to marginal improvement. The code for this work is available at https://github.com/gokulkarthik/mucot.

en cs.CL, cs.AI

Detail Sumber

DOAJ Open Access 2020

Of girls of gold and men of iron : a review on the Golden Dawn Girls documentary and the current predicament

Maria Paschalina Dimopoulou

History of Greece, Translating and interpreting

Detail Sumber

DOAJ Open Access 2020

Some observations on Greek popular worship and the traditional religiosity of the Greek people

Emmanouil Varvounis

Man's relationship to the beyond and the supernatural as well as the systematisation of humanity's corresponding pursuit of it in religions and the elaboration of organised rituals for expressing these convictions and worshiping the divine are realities lost in the beginnings of human presence on earth. Indeed, the specialisation of these perceptions and rituals and the concomitant appearance and shaping of the particular order of the priesthood led to a delineation between official and folk worship, the first being studied by the discipline of theology and the second by folklore studies, specifically the branch of "religious folklore". For these reasons, the relevant literature is constantly expanding and corresponding folklore studies are presently flourishing. This will continue as people never stop creating culturally and adopting new viewpoints and holding events where these forms correspond to relevant psychological needs. Because of this, "religious folklore" constitutes a constantly developing branch of folklore studies with great prospects for the future and space for many young academics to carry out research. Some aspects of the main forms of Greek popular religiosity will be examined in this paper.

History of Greece, Translating and interpreting

Detail Sumber

Hasil untuk "Greek philology and language"