CSF: Contrastive Semantic Features for Direct Multilingual Sign Language Generation
Tran Sy Bao
Sign language translation systems typically require English as an intermediary language, creating barriers for non-English speakers in the global deaf community. We present Canonical Semantic Form (CSF), a language-agnostic semantic representation framework that enables direct translation from any source language to sign language without English mediation. CSF decomposes utterances into nine universal semantic slots: event, intent, time, condition, agent, object, location, purpose, and modifier. A key contribution is our comprehensive condition taxonomy comprising 35 condition types across eight semantic categories, enabling nuanced representation of conditional expressions common in everyday communication. We train a lightweight transformer-based extractor (0.74 MB) that achieves 99.03% average slot extraction accuracy across four typologically diverse languages: English, Vietnamese, Japanese, and French. The model demonstrates particularly strong performance on condition classification (99.4% accuracy) despite the 35-class complexity. With inference latency of 3.02ms on CPU, our approach enables real-time sign language generation in browser-based applications. We release our code, trained models, and multilingual dataset to support further research in accessible sign language technology.
Wettbewerb "Aus der Welt der Griechen": Klassen 6 bis 8
Aurelius Stöppelkamp, Aurelius Stöppelkamp
Greek language and literature. Latin language and literature, Philology. Linguistics
PolyTruth: Multilingual Disinformation Detection using Transformer-Based Language Models
Zaur Gouliev, Jennifer Waters, Chengqian Wang
Disinformation spreads rapidly across linguistic boundaries, yet most AI models are still benchmarked only on English. We address this gap with a systematic comparison of five multilingual transformer models: mBERT, XLM, XLM-RoBERTa, RemBERT, and mT5 on a common fake-vs-true machine learning classification task. While transformer-based language models have demonstrated notable success in detecting disinformation in English, their effectiveness in multilingual contexts still remains up for debate. To facilitate evaluation, we introduce PolyTruth Disinfo Corpus, a novel corpus of 60,486 statement pairs (false claim vs. factual correction) spanning over twenty five languages that collectively cover five language families and a broad topical range from politics, health, climate, finance, and conspiracy, half of which are fact-checked disinformation claims verified by an augmented MindBugs Discovery dataset. Our experiments revealed performance variations. Models such as RemBERT achieved better overall accuracy, particularly excelling in low-resource languages, whereas models like mBERT and XLM exhibit considerable limitations when training data is scarce. We provide a discussion of these performance patterns and implications for real-world deployment. The dataset is publicly available on our GitHub repository to encourage further experimentation and advancement. Our findings illuminate both the potential and the current limitations of AI systems for multilingual disinformation detection.
T.K. Papatsonis: Cold War Catholic?
David Ricks
The tragic period of the Civil War and Cold War in Greece generated much poetry of lasting value, mostly from the Left . The poet T. K. Papatsonis (1895–1976), a fi gure with (among Greek poets) an idiosyncratic political and religious perspective, produced a response of quite a diff erent kind: an 'instant poem' written ira et studio as soon as the show trial of the Hungarian Joseph Cardinal Mindszenty (1949) was concluded. The present discussion provides a reading of Papatsonis' unusual poem within its Cold War context, with attention to its allegiances and its possible contradictions.
History of Greece, Translating and interpreting
End-to-End Graph Flattening Method for Large Language Models
Bin Hong, Jinze Wu, Jiayu Liu
et al.
In recent years, the breakthrough of Large Language Models (LLMs) offers new ideas for achieving universal methods on graph data. The common practice of converting graphs into natural language for LLMs, which refers to graph flattening, exhibits good generalizability and interpretability. However, the poor organization of the textual format results in poor performance in long-distance scenario understanding. Inspired by human cognitive reasoning habits, we propose a novel method for graph flattening to fit LLMs, termed as End-to-End DAG-Path prompting (EEDP). Experiments on real-world datasets show that EEDP enhances the reasoning performance of LLMs in long-distance scenarios while maintaining excellent performance in short-distance scenarios, demonstrating good robustness in the face of distance variations.
Social Bias in Large Language Models For Bangla: An Empirical Study on Gender and Religious Bias
Jayanta Sadhu, Maneesha Rani Saha, Rifat Shahriyar
The rapid growth of Large Language Models (LLMs) has put forward the study of biases as a crucial field. It is important to assess the influence of different types of biases embedded in LLMs to ensure fair use in sensitive fields. Although there have been extensive works on bias assessment in English, such efforts are rare and scarce for a major language like Bangla. In this work, we examine two types of social biases in LLM generated outputs for Bangla language. Our main contributions in this work are: (1) bias studies on two different social biases for Bangla, (2) a curated dataset for bias measurement benchmarking and (3) testing two different probing techniques for bias detection in the context of Bangla. This is the first work of such kind involving bias assessment of LLMs for Bangla to the best of our knowledge. All our code and resources are publicly available for the progress of bias related research in Bangla NLP.
Misgendering and Assuming Gender in Machine Translation when Working with Low-Resource Languages
Sourojit Ghosh, Srishti Chatterjee
This chapter focuses on gender-related errors in machine translation (MT) in the context of low-resource languages. We begin by explaining what low-resource languages are, examining the inseparable social and computational factors that create such linguistic hierarchies. We demonstrate through a case study of our mother tongue Bengali, a global language spoken by almost 300 million people but still classified as low-resource, how gender is assumed and inferred in translations to and from the high(est)-resource English when no such information is provided in source texts. We discuss the postcolonial and societal impacts of such errors leading to linguistic erasure and representational harms, and conclude by discussing potential solutions towards uplifting languages by providing them more agency in MT conversations.
Validation of the Scientific Literature via Chemputation Augmented by Large Language Models
Sebastian Pagel, Michael Jirasek, Leroy Cronin
Chemputation is the process of programming chemical robots to do experiments using a universal symbolic language, but the literature can be error prone and hard to read due to ambiguities. Large Language Models (LLMs) have demonstrated remarkable capabilities in various domains, including natural language processing, robotic control, and more recently, chemistry. Despite significant advancements in standardizing the reporting and collection of synthetic chemistry data, the automatic reproduction of reported syntheses remains a labour-intensive task. In this work, we introduce an LLM-based chemical research agent workflow designed for the automatic validation of synthetic literature procedures. Our workflow can autonomously extract synthetic procedures and analytical data from extensive documents, translate these procedures into universal XDL code, simulate the execution of the procedure in a hardware-specific setup, and ultimately execute the procedure on an XDL-controlled robotic system for synthetic chemistry. This demonstrates the potential of LLM-based workflows for autonomous chemical synthesis with Chemputers. Due to the abstraction of XDL this approach is safe, secure, and scalable since hallucinations will not be chemputable and the XDL can be both verified and encrypted. Unlike previous efforts, which either addressed only a limited portion of the workflow, relied on inflexible hard-coded rules, or lacked validation in physical systems, our approach provides four realistic examples of syntheses directly executed from synthetic literature. We anticipate that our workflow will significantly enhance automation in robotically driven synthetic chemistry research, streamline data extraction, improve the reproducibility, scalability, and safety of synthetic and experimental chemistry.
Czech literature in Greek translations
Marie Urbanová
Based on a bibliography of translations from Czech into Modern Greek created by the author of the article, the text offers insights into the appearance of Czech literature on the Greek book market since the beginning of the 20th century. It first briefly compares the book markets in both countries and subsequently analyses the content of the bibliography from the following angles: The existent translations are projected onto a timeline with a historical explanation of the resulting pattern. An overall commentary is given on the publishers involved in producing these translations, the source languages with which the translators worked and the subsequent editions and new translations that have appeared in Greece. The final chapter presents the most significant translations and the possible motivation for their creation, the most translated authors and the most active or otherwise important translators.
History of Greece, Translating and interpreting
Massively Multilingual Language Models for Cross Lingual Fact Extraction from Low Resource Indian Languages
Bhavyajeet Singh, Pavan Kandru, Anubhav Sharma
et al.
Massive knowledge graphs like Wikidata attempt to capture world knowledge about multiple entities. Recent approaches concentrate on automatically enriching these KGs from text. However a lot of information present in the form of natural text in low resource languages is often missed out. Cross Lingual Information Extraction aims at extracting factual information in the form of English triples from low resource Indian Language text. Despite its massive potential, progress made on this task is lagging when compared to Monolingual Information Extraction. In this paper, we propose the task of Cross Lingual Fact Extraction(CLFE) from text and devise an end-to-end generative approach for the same which achieves an overall F1 score of 77.46.
Large Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective on Understanding
Bram M. A. van Dijk, Tom Kouwenhoven, Marco R. Spruit
et al.
Current Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text. LLMs are appearing rapidly, and debates on LLM capacities have taken off, but reflection is lagging behind. Thus, in this position paper, we first zoom in on the debate and critically assess three points recurring in critiques of LLM capacities: i) that LLMs only parrot statistical patterns in the training data; ii) that LLMs master formal but not functional language competence; and iii) that language learning in LLMs cannot inform human language learning. Drawing on empirical and theoretical arguments, we show that these points need more nuance. Second, we outline a pragmatic perspective on the issue of `real' understanding and intentionality in LLMs. Understanding and intentionality pertain to unobservable mental states we attribute to other humans because they have pragmatic value: they allow us to abstract away from complex underlying mechanics and predict behaviour effectively. We reflect on the circumstances under which it would make sense for humans to similarly attribute mental states to LLMs, thereby outlining a pragmatic philosophical context for LLMs as an increasingly prominent technology in society.
Exploring the Landscape of Natural Language Processing Research
Tim Schopf, Karim Arabi, Florian Matthes
As an efficient approach to understand, generate, and process natural language texts, research in natural language processing (NLP) has exhibited a rapid spread and wide adoption in recent years. Given the increasing research work in this area, several NLP-related approaches have been surveyed in the research community. However, a comprehensive study that categorizes established topics, identifies trends, and outlines areas for future research remains absent. Contributing to closing this gap, we have systematically classified and analyzed research papers in the ACL Anthology. As a result, we present a structured overview of the research landscape, provide a taxonomy of fields of study in NLP, analyze recent developments in NLP, summarize our findings, and highlight directions for future work.
Evaluasi penerapan model pembelajaran inkuiri terbimbing dalam pembelajaran kimia : Suatu tinjauan sistematis literatur
Ainayya Almira, Anisah Rachmawati, Insi Norma Jelita
et al.
The aim of this research is to provide insight to chemistry education teachers and researchers regarding the effectiveness of the guided inquiry learning model and provide direction for further research in this field. The research method used in this article is Systematic Literature Review (SLR), to help compile and evaluate various research related to the guided inquiry learning model. The instrument used in this research is to present the results of a literature review of various articles discussing the application of this model in chemistry learning by exploring the definition, application, strengths, weaknesses and effectiveness of the guided inquiry learning model in chemistry learning. The research results show that the application of this model can be carried out both in the theoretical and practical aspects of chemistry learning. The advantages of the guided inquiry model involve students actively, increase learning independence, and provide students with the opportunity to discuss and find their own answers. Students who study with this model tend to have higher learning achievements. However, there are also disadvantages, such as the time required to implement this model and obstacles in dealing with students who are not yet familiar with this approach.
Texterschließung mithilfe von Wortschatzarbeit (1. Lernjahr)
Lena Warzog
Greek language and literature. Latin language and literature, Philology. Linguistics
Controlling Translation Formality Using Pre-trained Multilingual Language Models
Elijah Rippeth, Sweta Agrawal, Marine Carpuat
This paper describes the University of Maryland's submission to the Special Task on Formality Control for Spoken Language Translation at \iwslt, which evaluates translation from English into 6 languages with diverse grammatical formality markers. We investigate to what extent this problem can be addressed with a \textit{single multilingual model}, simultaneously controlling its output for target language and formality. Results show that this strategy can approach the translation quality and formality control achieved by dedicated translation models. However, the nature of the underlying pre-trained language model and of the finetuning samples greatly impact results.
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black, Stella Biderman, Eric Hallahan
et al.
We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest dense autoregressive model that has publicly available weights at the time of submission. In this work, we describe \model{}'s architecture and training and evaluate its performance on a range of language-understanding, mathematics, and knowledge-based tasks. We find that GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models. We open-source the training and evaluation code, as well as the model weights, at https://github.com/EleutherAI/gpt-neox.
Persuasive Natural Language Generation -- A Literature Review
Sebastian Duerr, Peter A. Gloor
This literature review focuses on the use of Natural Language Generation (NLG) to automatically detect and generate persuasive texts. Extending previous research on automatic identification of persuasion in text, we concentrate on generative aspects through conceptualizing determinants of persuasion in five business-focused categories: benevolence, linguistic appropriacy, logical argumentation, trustworthiness, tools and datasets. These allow NLG to increase an existing message's persuasiveness. Previous research illustrates key aspects in each of the above mentioned five categories. A research agenda to further study persuasive NLG is developed. The review includes analysis of seventy-seven articles, outlining the existing body of knowledge and showing the steady progress in this research field.
Modeling the Music Genre Perception across Language-Bound Cultures
Elena V. Epure, Guillaume Salha, Manuel Moussallam
et al.
The music genre perception expressed through human annotations of artists or albums varies significantly across language-bound cultures. These variations cannot be modeled as mere translations since we also need to account for cultural differences in the music genre perception. In this work, we study the feasibility of obtaining relevant cross-lingual, culture-specific music genre annotations based only on language-specific semantic representations, namely distributed concept embeddings and ontologies. Our study, focused on six languages, shows that unsupervised cross-lingual music genre annotation is feasible with high accuracy, especially when combining both types of representations. This approach of studying music genres is the most extensive to date and has many implications in musicology and music information retrieval. Besides, we introduce a new, domain-dependent cross-lingual corpus to benchmark state of the art multilingual pre-trained embedding models.
Alcances poéticos y culturales de una tipología mítica Ovidiana
Eleonora Tola
A partir de un enfoque “diferencial” del mito, abordamos la versión ovidiana del Ciclo tebano en el libro III de las Metamorfosis. La identificación de algunas valencias poéticas e ideológicas propias de esta actualización de la saga mítica nos permite explicitar el funcionamiento literario de uno de los principales motivos culturales del sistema romano de creencias. Mostramos que la leyenda fundacional de Cadmo construye de manera emblemática un patrón que atraviesa las representaciones de la historia colectiva en la literatura latina.
Philology. Linguistics, Greek language and literature. Latin language and literature
Die Bedeutung der ethne in der Politik und den Politeiai des Aristoteles
Gertrud Dietze-Mager
The Politeiai are one of Aristotle’s historical works. Several hundreds of fragments have come down to us. While Aristotle’s Nomima barbarika recorded the customs of the barbaric ethne, the Politeiai are generally considered to be a collection of polisconstitutions. A closer look reveals, however, that alongside a majority of Greek poleis Aristotle also included several ethne in his Politeiai, namely those in the North(west) of the Greek mainland and on the Peleponnesus. This article tries to shed light on Aristotle’s reasons for selecting these ethne. On the basis of key passages in the Politics, the author argues that their presence in the Politeiai indicates that Aristotle considered them as Hellenic, and, although inferior in status to the polis, capable of having a politeia. In Aristotle’s time, nearly all of the ethne known to have been included in the Politeiai had formed koina. While Aristotle did not explicitly discuss the federal state, he acknowledged its existence both in the Politics and the Politeiai, obviously inspired by the political reality of his time in which the koina played an increasingly prominent role, illustrated by their presence as members in Hellenic treaties alongside the poleis.
History of the Greco-Roman World, Greek language and literature. Latin language and literature