PL-Guard: Benchmarking Language Model Safety for Polish
Aleksandra Krasnodębska, Karolina Seweryn, Szymon Łukasik
et al.
Despite increasing efforts to ensure the safety of large language models (LLMs), most existing safety assessments and moderation tools remain heavily biased toward English and other high-resource languages, leaving majority of global languages underexamined. To address this gap, we introduce a manually annotated benchmark dataset for language model safety classification in Polish. We also create adversarially perturbed variants of these samples designed to challenge model robustness. We conduct a series of experiments to evaluate LLM-based and classifier-based models of varying sizes and architectures. Specifically, we fine-tune three models: Llama-Guard-3-8B, a HerBERT-based classifier (a Polish BERT derivative), and PLLuM, a Polish-adapted Llama-8B model. We train these models using different combinations of annotated data and evaluate their performance, comparing it against publicly available guard models. Results demonstrate that the HerBERT-based classifier achieves the highest overall performance, particularly under adversarial conditions.
Child vs. machine language learning: Can the logical structure of human language unleash LLMs?
Uli Sauerland, Celia Matthaei, Felix Salfner
We argue that human language learning proceeds in a manner that is different in nature from current approaches to training LLMs, predicting a difference in learning biases. We then present evidence from German plural formation by LLMs that confirm our hypothesis that even very powerful implementations produce results that miss aspects of the logic inherent to language that humans have no problem with. We conclude that attention to the different structures of human language and artificial neural networks is likely to be an avenue to improve LLM performance.
Officina di IG XIV2 – I graffiti su pilastro dall’acropoli di Monte Sannace
Fanizzi, Federica
This study aims to provide an epigraphic and contextual analysis of the graffiti on the pier of the acropolis of Monte Sannace. The data, of exceptional significance, are valuable especially considering the extensive research conducted on abecedaria and ancient languages of the Mediterranean. The document includes four abecedaria and other six graffiti which are analyzed in relation to their regional epigraphic landscape and through a comparison coherent with their micro-epigraphic context.
Ancient history, Greek philology and language
Evaluating Large Language Models along Dimensions of Language Variation: A Systematik Invesdigatiom uv Cross-lingual Generalization
Niyati Bafna, Kenton Murray, David Yarowsky
While large language models exhibit certain cross-lingual generalization capabilities, they suffer from performance degradation (PD) on unseen closely-related languages (CRLs) and dialects relative to their high-resource language neighbour (HRLN). However, we currently lack a fundamental understanding of what kinds of linguistic distances contribute to PD, and to what extent. Furthermore, studies of cross-lingual generalization are confounded by unknown quantities of CRL language traces in the training data, and by the frequent lack of availability of evaluation data in lower-resource related languages and dialects. To address these issues, we model phonological, morphological, and lexical distance as Bayesian noise processes to synthesize artificial languages that are controllably distant from the HRLN. We analyse PD as a function of underlying noise parameters, offering insights on model robustness to isolated and composed linguistic phenomena, and the impact of task and HRL characteristics on PD. We calculate parameter posteriors on real CRL-HRLN pair data and show that they follow computed trends of artificial languages, demonstrating the viability of our noisers. Our framework offers a cheap solution for estimating task performance on an unseen CRL given HRLN performance using its posteriors, as well as for diagnosing observed PD on a CRL in terms of its linguistic distances from its HRLN, and opens doors to principled methods of mitigating performance degradation.
A voz do Autor na comédia greco-latina
Maria de Fátima Silva
Do ataque directo e pessoal, típico da tonalidade vigorosa e didáctica da comédia grega antiga, um eco sobressai naquele que foi, no progresso do género em toda a Antiguidade, o último grande nome: Terêncio. Motivava-o agora um propósito claro: o de se defender da animosidade de concorrentes e rivais e o de clarificar critérios artísticos por que procurou nortear a sua produção. Não era tal preocupação alheia às atenções dos velhos poetas de Atenas. Por isso se toma possi'vel, apesar de todas as diferenças a separarem dois momentos notáveis na história da comédia, aproximar dois poetas -Aristófanes e Terêncio - em algo que do mesmo modo os empolgou: a luta pela renovação e pelo progresso de uma arte elevada, para o que os credenciavam duas indispensáveis qualidades: a centelha do génio e a experiência do longo caminho que, pelo esforço contínuo, conduz ao sucesso.
History of the Greco-Roman World, Greek language and literature. Latin language and literature
Inhaltsverzeichnis
Susanne Aretz
Greek language and literature. Latin language and literature, Philology. Linguistics
Implicit Self-supervised Language Representation for Spoken Language Diarization
Jagabandhu Mishra, S. R. Mahadeva Prasanna
In a code-switched (CS) scenario, the use of spoken language diarization (LD) as a pre-possessing system is essential. Further, the use of implicit frameworks is preferable over the explicit framework, as it can be easily adapted to deal with low/zero resource languages. Inspired by speaker diarization (SD) literature, three frameworks based on (1) fixed segmentation, (2) change point-based segmentation and (3) E2E are proposed to perform LD. The initial exploration with synthetic TTSF-LD dataset shows, using x-vector as implicit language representation with appropriate analysis window length ($N$) can able to achieve at per performance with explicit LD. The best implicit LD performance of $6.38$ in terms of Jaccard error rate (JER) is achieved by using the E2E framework. However, considering the E2E framework the performance of implicit LD degrades to $60.4$ while using with practical Microsoft CS (MSCS) dataset. The difference in performance is mostly due to the distributional difference between the monolingual segment duration of secondary language in the MSCS and TTSF-LD datasets. Moreover, to avoid segment smoothing, the smaller duration of the monolingual segment suggests the use of a small value of $N$. At the same time with small $N$, the x-vector representation is unable to capture the required language discrimination due to the acoustic similarity, as the same speaker is speaking both languages. Therefore, to resolve the issue a self-supervised implicit language representation is proposed in this study. In comparison with the x-vector representation, the proposed representation provides a relative improvement of $63.9\%$ and achieved a JER of $21.8$ using the E2E framework.
Another Early Evidence of the Rus’?
A. Vinogradov
The article discusses the question of the ethnonym Ῥουσ- in the Byzantine literature, attested in the tenth and eleventh centuries, either as part of an adjective, or in sources under Rus’ian influence, and then disappearing until the fifteenth century. In this connec- tion, the question arises whether the ethnonym Ῥούσιος actually existed, as documented by Liutprand of Cremona, but could be explained as an influence of the Latin language. The readers’ attention is drawn to the ethnonym Ῥούσιοι in the list of the peoples conquered by Alexander the Great, known in two manuscripts of the edition γ of the Greek Alexander Romance. Despite the presence of a number of fantastic peoples in this list, Ῥούσιοι were the real “barbarians” and, along with Χουνάβιοι, belonged to the source’s latest layer, dating from the tenth century. Several arguments make it necessary to identify them with Rus’. Thus, the first evidence of the ethnonym Ῥούσιοι appears on purely Greek soil, as well as the possibility to date this list of peoples. However, the use of the early ethnonym Ῥούσιοι allows the one to date to the period before the twelfth century this list of peoples from the edition γ of the Greek “Alexander Romance,” the earliest manuscripts of which date back to the fourteenth century.
Ricordo di Giovanni Cerri
Lomiento, Liana
An obituary of Giovanni Cerri (1 October 1940-5 September 2021).
Greek language and literature. Latin language and literature, History of the Greco-Roman World
O Uraguay, de Basílio da Gama: questões retórico-políticas coloniais entre a Antiguidade e a Modernidade
Dreykon Fernandes Nascimento, Leni Ribeiro Leite
Este artigo é resultado de um projeto de pesquisa que buscou observar o uso da máquina retórica e poética da Antiguidade durante o conturbado período pombalino português e seu impacto na América Portuguesa. Entendemos que a Primeira Modernidade utilizou-se dos elementos discursivos legados pela Antiguidade como instrumento de construção prática e simbólica de um enunciado acerca dos conflitos que se desenrolavam nas colônias, e usamos como exemplo de análise e observação deste fenômeno o proêmio do poema O Uraguay, do autor luso-brasileiro Basílio da Gama.
History of the Greco-Roman World, Greek language and literature. Latin language and literature
Retórica, memoria y medicina: el Ars memorativa de Lorenz Fries
Manuel Mañas Núñez
El presente artículo analiza el Ars memorativa (1523) de Lorenz Fries, un médico interesado en la técnica de la memoria artificial como sistema per locos et imagines y que también estima que la memoria natural puede fortalecerse mediante la dieta y con remedios farmacológicos y médicos.
History of the Greco-Roman World, Greek language and literature. Latin language and literature
Drivers' attention detection: a systematic literature review
Luiz G. Véras, Anna K. F. Gomes, Guilherme A. R. Dominguez
et al.
Countless traffic accidents often occur because of the inattention of the drivers. Many factors can contribute to distractions while driving, since objects or events to physiological conditions, as drowsiness and fatigue, do not allow the driver to stay attentive. The technological progress allowed the development and application of many solutions to detect the attention in real situations, promoting the interest of the scientific community in these last years. Commonly, these solutions identify the lack of attention and alert the driver, in order to help her/him to recover the attention, avoiding serious accidents and preserving lives. Our work presents a Systematic Literature Review (SLR) of the methods and criteria used to detect attention of drivers at the wheel, focusing on those methods based on images. As results, 50 studies were selected from the literature on drivers' attention detection, in which 22 contain solutions in the desired context. The results of SLR can be used as a resource in the preparation of new research projects in drivers' attention detection.
Challenges in Measuring Bias via Open-Ended Language Generation
Afra Feyza Akyürek, Muhammed Yusuf Kocyigit, Sejin Paik
et al.
Researchers have devised numerous ways to quantify social biases vested in pretrained language models. As some language models are capable of generating coherent completions given a set of textual prompts, several prompting datasets have been proposed to measure biases between social groups -- posing language generation as a way of identifying biases. In this opinion paper, we analyze how specific choices of prompt sets, metrics, automatic tools and sampling strategies affect bias results. We find out that the practice of measuring biases through text completion is prone to yielding contradicting results under different experiment settings. We additionally provide recommendations for reporting biases in open-ended language generation for a more complete outlook of biases exhibited by a given language model. Code to reproduce the results is released under https://github.com/feyzaakyurek/bias-textgen.
Systematic review of development literature from Latin America between 2010- 2021
Pedro Alfonso de la Puente, Juan José Berdugo Cepeda, María José Pérez Pacheco
The purpose of this systematic review is to identify and describe the state of development literature published in Latin America, in Spanish and English, since 2010. For this, we carried out a topographic review of 44 articles available in the most important bibliographic indexes of Latin America, published in journals of diverse disciplines. Our analysis focused on analyzing the nature and composition of literature, finding a large proportion of articles coming from Mexico and Colombia, as well as specialized in the economic discipline. The most relevant articles reviewed show methodological and thematic diversity, with special attention to the problem of growth in Latin American development. An important limitation of this review is the exclusion of articles published in Portuguese, as well as non-indexed literature (such as theses and dissertations). This leads to various recommendations for future reviews of the development literature produced in Latin America.
Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge
Zhihong Chen, Guanbin Li, Xiang Wan
Medical vision-and-language pre-training (Med-VLP) has received considerable attention owing to its applicability to extracting generic vision-and-language representations from medical images and texts. Most existing methods mainly contain three elements: uni-modal encoders (i.e., a vision encoder and a language encoder), a multi-modal fusion module, and pretext tasks, with few studies considering the importance of medical domain expert knowledge and explicitly exploiting such knowledge to facilitate Med-VLP. Although there exist knowledge-enhanced vision-and-language pre-training (VLP) methods in the general domain, most require off-the-shelf toolkits (e.g., object detectors and scene graph parsers), which are unavailable in the medical domain. In this paper, we propose a systematic and effective approach to enhance Med-VLP by structured medical knowledge from three perspectives. First, considering knowledge can be regarded as the intermediate medium between vision and language, we align the representations of the vision encoder and the language encoder through knowledge. Second, we inject knowledge into the multi-modal fusion model to enable the model to perform reasoning using knowledge as the supplementation of the input image and text. Third, we guide the model to put emphasis on the most critical information in images and texts by designing knowledge-induced pretext tasks. To perform a comprehensive evaluation and facilitate further research, we construct a medical vision-and-language benchmark including three tasks. Experimental results illustrate the effectiveness of our approach, where state-of-the-art performance is achieved on all downstream tasks. Further analyses explore the effects of different components of our approach and various settings of pre-training.
Scaling Native Language Identification with Transformer Adapters
Ahmet Yavuz Uluslu, Gerold Schneider
Native language identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is useful for a variety of purposes including marketing, security and educational applications. NLI is usually framed as a multi-label classification task, where numerous designed features are combined to achieve state-of-the-art results. Recently deep generative approach based on transformer decoders (GPT-2) outperformed its counterparts and achieved the best results on the NLI benchmark datasets. We investigate this approach to determine the practical implications compared to traditional state-of-the-art NLI systems. We introduce transformer adapters to address memory limitations and improve training/inference speed to scale NLI applications for production.
Nikos Kazantzakis’ Unshot Adaptations of Don Quixote and Decameron
Panayiota Mini
This article examines two of Nikos Kazantzakis’ unshot screenplays of the early 1930s: his adaptations of Cervantes’ Don Quixote and Boccaccio’s Decameron, kept in typed manuscripts at the Nikos Kazantzakis Museum Foundation in Iraklion, Crete. The article analyses Kazantzakis’ Don Quixote and Decameron in the contexts of early talking cinema and his ideas of the image-language relationship. Written at a time when the artistic value of talking cinema was still debated, Kazantzakis’ adaptations demonstrate that he sought to express ideas with images rather than dialogue (Don Quixote) and use sound as a creative element (Decameron) in ways alluding to Eisenstein’s 1928-1929 writings, with which, as evidence suggests, the Greek author was familiar. Thus, Kazantzakis’ Don Quixote and Decameron show how a technological development in film history – the coming of sound – and the Soviet film theory influenced this author’s adaptation techniques, while also enhancing our understanding of his creative career as well as the worldwide resonance of Cervantes’ and Boccaccio’s literary milestones.
Ancient history, Greek language and literature. Latin language and literature
Digitalne kompetencije učitelja i nastavnika klasičnih jezika
Kornelija Pavlić
Ancient history, Greek language and literature. Latin language and literature
Del Nero, Valerio, Juan Luis Vives. Scritti politico-filosofici, Canterano, Aracne Editrice, 2020, 384 págs., ISBN: 978-88-255- 3028-5.
Enrique González González
Reseña de libro
Greek language and literature. Latin language and literature
Lex Irnitana – flavijski municipalni zakon
Milan Lovenjak
V članku je obravnavan zakon o podelitvi mestnih pravic flavijskemu municipiju Irni (t.i. lex Irnitana) v današnji španski provinci Sevilli. Besedilo poznamo po zaslugi šestih bronastih plošč odkritih leta 1981 na lokaciji antičnega mesta, deli drugih dveh mestnih pravic Salpenze in Malake, ki se deloma prekrivajo z ohranjenimi poglavji lex Irnitana, pa kažejo, da vse tri sledijo istemu krovnemu zakonu iz obdobja flavijskih cesarjev. Po zaslugi navedenih kopij poznamo približno tri četrtine celotnega besedila, oz. 73 od 97 poglavij. Besedilo posameznih poglavij zakona govori o magistraturah, volitvah, mestnem svetu, ki je štel v Irni 63 članov, upravnih in finančnih zadevah, sodstvu, izbiri sodnikov v civilnih procesih idr. Sodstvo je veljalo za pravi kriterij mestne avtonomije in kot tako zavzema z poglavji od 84 do 92 več prostora kot katerokoli drugo področje.
Greek language and literature. Latin language and literature