Large language models (LLMs) have become an essential tool for natural language processing and artificial intelligence in general. Current open-source models are primarily trained on English texts, resulting in poorer performance on less-resourced languages and cultures. We present a set of methodological approaches necessary for the successful adaptation of an LLM to a less-resourced language, and demonstrate them using the Slovene language. We present GaMS3-12B, a generative model for Slovene with 12 billion parameters, and demonstrate that it is the best-performing open-source model for Slovene within its parameter range. We adapted the model to the Slovene language using three-stage continual pre-training of the Gemma 3 model, followed by two-stage supervised fine-tuning (SFT). We trained the model on a combination of 140B Slovene, English, Bosnian, Serbian, and Croatian pretraining tokens, and over 200 thousand English and Slovene SFT examples. We evaluate GaMS3-12B on the Slovenian-LLM-Eval datasets, English-to-Slovene translation, and the Slovene LLM arena. We show that the described model outperforms 12B Gemma 3 across all three scenarios and performs comparably to much larger commercial GPT-4o in the Slovene LLM arena, achieving a win rate of over 60 %.
En la epístola 12 Spinoza utiliza dos expresiones tan significativas como inusitadas: “a rebus aeternis fluit” y “ab aeternitate fluunt”. Se ha sostenido que su origen puede estar en el De divinatione de Cicerón, cuya obra Spinoza conocería en profundidad, aunque también se encuenta en las Disputationes metaphysicae de Suárez, principal representante de la escolástica de la época. En el presente trabajo se exponen las razones que hablarían en favor de una y otra hipótesis. Del mismo modo, se examinan los intermediarios que ha podido haber entre los autores citados y Spinoza, en particular el Thesaurus Ciceronianus de Mario Nizolius, libro que parece explicar la presencia de numerosas obras ciceronianas en Spinoza, y los Meletemata philosophica de Adriaan Heereboord.
History of the Greco-Roman World, Greek language and literature. Latin language and literature
Este ensaio analisa a interpretação do mito prometeico na obra de Almada Negreiros. A sua desconstrução de Prometeu é comparada às versões de Ésquilo e de Percy Bysshe Shelley, contextualizando-a como atualização do mito ao século XX, utilizando a narrativa mítica como exemplo do papel (prometeico) do Artista na sociedade, conforme a formulação de Almada Negreiros.
History of the Greco-Roman World, Greek language and literature. Latin language and literature
Rosario Moreno Soldevila, Manuel Alejandro González Muñoz, Alberto Marina Castillo
et al.
This paper describes four methodological proposals for rescuing from oblivion and highlighting women writers in Graeco-Roman Antiquity. In workshops employing a variety of active methodologies, students become acquainted with Greek writers like Sappho, Diotima of Mantinea and Aspasia, and their Roman counterparts, including Sulpicia and Agrippina the Younger, while also becoming aware of the authorship of these women writers and their lack of visibility. The proposals take the shape of activities aimed at fostering a vocation for science among baccalaureate students in Spain but can also be easily adapted to secondary and even higher education in other educational contexts.
Badr AlKhamissi, Greta Tuckute, Yingtian Tang
et al.
Large language models (LLMs) exhibit remarkable similarity to neural activity in the human language network. However, the key properties of language shaping brain-like representations, and their evolution during training as a function of different tasks remain unclear. We here benchmark 34 training checkpoints spanning 300B tokens across 8 different model sizes to analyze how brain alignment relates to linguistic competence. Specifically, we find that brain alignment tracks the development of formal linguistic competence -- i.e., knowledge of linguistic rules -- more closely than functional linguistic competence. While functional competence, which involves world knowledge and reasoning, continues to develop throughout training, its relationship with brain alignment is weaker, suggesting that the human language network primarily encodes formal linguistic structure rather than broader cognitive functions. We further show that model size is not a reliable predictor of brain alignment when controlling for feature size and find that the correlation between next-word prediction, behavioral alignment and brain alignment fades once models surpass human language proficiency. Finally, using the largest set of rigorous neural language benchmarks to date, we show that language brain alignment benchmarks remain unsaturated, highlighting opportunities for improving future models. Taken together, our findings suggest that the human language network is best modeled by formal, rather than functional, aspects of language.
According to Futrell and Mahowald [arXiv:2501.17047], both infants and language models (LMs) find attested languages easier to learn than impossible languages that have unnatural structures. We review the literature and show that LMs often learn attested and many impossible languages equally well. Difficult to learn impossible languages are simply more complex (or random). LMs are missing human inductive biases that support language acquisition.
This paper presents ElliottAgents, a multi-agent system leveraging natural language processing (NLP) and large language models (LLMs) to analyze complex stock market data. The system combines AI-driven analysis with the Elliott Wave Principle to generate human-comprehensible predictions and explanations. A key feature is the natural language dialogue between agents, enabling collaborative analysis refinement. The LLM-enhanced architecture facilitates advanced language understanding, reasoning, and autonomous decision-making. Experiments demonstrate the system's effectiveness in pattern recognition and generating natural language descriptions of market trends. ElliottAgents contributes to NLP applications in specialized domains, showcasing how AI-driven dialogue systems can enhance collaborative analysis in data-intensive fields. This research bridges the gap between complex financial data and human understanding, addressing the need for interpretable and adaptive prediction systems in finance.
Increasingly, more and more people are turning to large language models (LLMs) for healthcare advice and consultation, making it important to gauge the efficacy and accuracy of the responses of LLMs to such queries. While there are pre-existing medical benchmarks literature which seeks to accomplish this very task, these benchmarks are almost universally in English, which has led to a notable gap in existing literature pertaining to multilingual LLM evaluation. Within this work, we seek to aid in addressing this gap with Cancer-Myth-Indic, an Indic language benchmark built by translating a 500-item subset of Cancer-Myth, sampled evenly across its original categories, into five under-served but widely used languages from the subcontinent (500 per language; 2,500 translated items total). Native-speaker translators followed a style guide for preserving implicit presuppositions in translation; items feature false presuppositions relating to cancer. We evaluate several popular LLMs under this presupposition stress.
The translation of scientific medical texts is an essential component of medical education, as it enriches students’ medical terminology and provides new information for future professional use. This article examined the translation of medical literature by medical students during their studies and the challenges encountered when working with scientific sources. The study aimed to analyse current issues faced by students in translating, adapting, and utilising medical literature and to provide recommendations for addressing these difficulties. The findings indicated that the translation process is fraught with challenges, particularly due to the complexity of medical terminology, the presence of Greek and Latin roots, the use of calques and Anglicisms, and the lack of direct equivalents in the target language. These factors contribute to inaccurate or imprecise translations, which may lead to misinformation when applied in professional practice. Additionally, linguistic and cultural barriers further complicate the translation process. The article also explored key strategies for addressing each of the identified challenges, aiming to facilitate junior medical students’ work with scientific medical texts. Methods that yield the best results in language learning and help enhance communication skills were proposed, including engagement with native speakers, regular use of dictionaries and encyclopaedias, and practical translation exercises to develop additional skills. The identification of common errors and difficulties encountered by students will contribute to improving the learning process and enhancing the quality of medical text translations. The findings may be of value to educators in medical institutions as well as professionals involved in the translation of medical literature
The teaching of ancient languages at university level is usually quite different from its counterpart in secondary schools: the latter will offer only a small number of such languages (e.g. Latin and Greek) as compared to the broader spectrum available at universities. At the same time, these secondary-school courses traditionally last longer and next to the introduction to the language include a basic education in its literature, culture, and history – which is not self-evidently the case at university level. This paper argues that particularly for less-commonly studied languages, such contextualisation offers the learner much-needed insights into the workings of the language they are studying and facilitates the homogenisation of disparate learner groups. This claim is illustrated on the example of Classical Armenian: learners from different disciplines (theology, history, linguistics, etc.) take such a course, arriving with different abilities, background knowledge, expectations. Unless additional courses on Armenian history, etc. are provided, the learners’ diverse interests can only be addressed as an integral part of language learning. This approach is advantageous for the maintenance of the learners’ zeal and for a better understanding of literature. While the weighting of materials used should rely on the individual group’s composition, a corresponding textbook should include them in roughly equal parts. Yet, all information should remain pertinent to the primary goal: language learning. The solution proposed here is the seamless integration of such historical and cultural information in the grammatical exercises, readings, as well as the inclusion of regular excursus on relevant topics.
Hoang H Nguyen, Khyati Mahajan, Vikas Yadav
et al.
Although multilingual LLMs have achieved remarkable performance across benchmarks, we find they continue to underperform on non-Latin script languages across contemporary LLM families. This discrepancy arises from the fact that LLMs are pretrained with orthographic scripts, which are dominated by Latin characters that obscure their shared phonology with non-Latin scripts. We propose leveraging phonemic transcriptions as complementary signals to induce script-invariant representations. Our study demonstrates that integrating phonemic signals improves performance across both non-Latin and Latin script languages, with a particularly significant impact on closing the performance gap between the two. Through detailed experiments, we show that phonemic and orthographic scripts retrieve distinct examples for in-context learning (ICL). This motivates our proposed Mixed-ICL retrieval strategy, where further aggregation from both leads to our significant performance improvements for both Latin script languages (up to 12.6%) and non-Latin script languages (up to 15.1%) compared to randomized ICL retrieval.
English language is in the spotlight of the Natural Language Processing (NLP) community with other languages, like Greek, lagging behind in terms of offered methods, tools and resources. Due to the increasing interest in NLP, in this paper we try to condense research efforts for the automatic processing of Greek language covering the last three decades. In particular, we list and briefly discuss related works, resources and tools, categorized according to various processing layers and contexts. We are not restricted to the modern form of Greek language but also cover Ancient Greek and various Greek dialects. This survey can be useful for researchers and students interested in NLP tasks, Information Retrieval and Knowledge Management for the Greek language.
Computational modeling plays an essential role in the study of language emergence. It aims to simulate the conditions and learning processes that could trigger the emergence of a structured language within a simulated controlled environment. Several methods have been used to investigate the origin of our language, including agent-based systems, Bayesian agents, genetic algorithms, and rule-based systems. This chapter explores another class of computational models that have recently revolutionized the field of machine learning: deep learning models. The chapter introduces the basic concepts of deep and reinforcement learning methods and summarizes their helpfulness for simulating language emergence. It also discusses the key findings, limitations, and recent attempts to build realistic simulations. This chapter targets linguists and cognitive scientists seeking an introduction to deep learning as a tool to investigate language evolution.
Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson
et al.
The first Workshop on Language Models for Low-Resource Languages (LoResLM 2025) was held in conjunction with the 31st International Conference on Computational Linguistics (COLING 2025) in Abu Dhabi, United Arab Emirates. This workshop mainly aimed to provide a forum for researchers to share and discuss their ongoing work on language models (LMs) focusing on low-resource languages, following the recent advancements in neural language models and their linguistic biases towards high-resource languages. LoResLM 2025 attracted notable interest from the natural language processing (NLP) community, resulting in 35 accepted papers from 52 submissions. These contributions cover a broad range of low-resource languages from eight language families and 13 diverse research areas, paving the way for future possibilities and promoting linguistic inclusivity in NLP.
English is a West Germanic language that originated from Anglo-Frisian dialects brought to Britain in the mid 5th to 7th centuries AD by Anglo-Saxon migrants from what is now northwest Germany, west Denmark and the Netherlands. The language has undergone major changes and developments in its pronunciation, vocabulary, grammar, and orthography throughout its over 1500 year history. This article provides an overview of the key influences and developments that have shaped the English language into its present global form. It examines the linguistic influences of Celtic, Norse, French, Latin, Greek and other languages on English. It also explores the impact of historical events, the growth of literacy, the invention of the printing press, dictionary compilation and standardized spelling on the development of English. The analysis shows that English has an unparalleled capacity to absorb, adapt and incorporate words and features from other languages. Through the early spread of English around the British Isles, and later via 19th and 20th century colonization and globalization, English has become the most widely spoken language worldwide with over 1.35 billion speakers.
Many factors, including societal shifts, technology, culture, and history, contribute to literary language's gradual but steady development. Literary language has changed throughout history to reflect new social and intellectual paradigms, and this research follows that change from antiquity to the present day. Examining the written traditions of classical antiquity, the Middle Ages, and the Renaissance follows an examination of the oral traditions of ancient civilizations when storytelling played a crucial role in preserving history and culture. Literary language differs between periods due to the influence of prevalent linguistic conventions, philosophical ideas, and technological developments. Moving from verbal to written communication is a watershed moment because it paves the way for textual standardization and preservation. Greek and Latin were the de jure languages of the classical period when many literary genres and styles flourished. Various vernacular languages emerged during the Middle Ages when oral and written traditions mingled. A more sophisticated and nuanced use of language was one of the many benefits of the Renaissance, which saw an upsurge in creative activity and a return to classical study. Making books more widely available and aiding in linguistic standardization, the printing press was invented in the fifteenth century and profoundly impacted the distribution of literary works. Examining how the Enlightenment, Romanticism, and Industrial Revolution affected literary language, the research delves further into how these eras mirrored more significant social shifts. As a result of increased communication across borders, new technologies, and the proliferation of online media, literary vocabulary has become more diverse than at any time in the twentieth and twenty-first centuries. Literary language's evolution demonstrates how human expression is ever-changing, adjusting to novel situations and technological advances. A thorough analysis of this development is given in this work, which sheds light on the complex interplay of literature, language, and society.
Many under-resourced languages require high-quality datasets for specific tasks such as offensive language detection, disinformation, or misinformation identification. However, the intricacies of the content may have a detrimental effect on the annotators. The article aims to revisit an approach of pseudo-labeling sensitive data on the example of Ukrainian tweets covering the Russian-Ukrainian war. Nowadays, this acute topic is in the spotlight of various language manipulations that cause numerous disinformation and profanity on social media platforms. The conducted experiment highlights three main stages of data annotation and underlines the main obstacles during machine annotation. Ultimately, we provide a fundamental statistical analysis of the obtained data, evaluation of models used for pseudo-labelling, and set further guidelines on how the scientists can leverage the corpus to execute more advanced research and extend the existing data samples without annotators' engagement.
This article describes the influence of the ancient and Christian tradition on the formation of the genology of the idyll. It is a species structure that has a special kind of motivation, which was manifested in the methods of its conventionalization. The author presents the stage of the entry of the idyll into Polish literature of the Renaissance starting with the first Latin-language traces of idyllic creativity by Pietro Illicino and Stanisław Koszutski. Currently, these literary works are little known, although the circumstances of their creation are very interesting. Furthermore, bucolic provenance was discussed, referring to the Italian and Greek traditions. The theme of ancient and Roman Catholic Arcadia was analysed. The systematized theoretical record includes information on the relationship of the idyll with other pastoral genres, such as eclogues, carols or pastorales. The article also interprets idylls that refer to the Christian concept of neoplatonic love, i.e. the divine Amor divinus and the wordly Amor profanus. The process of transforming a Latin idyll into a Polish-language idyll is presented in the context of cultural and literary conditions of that time. This article contains a description of important genre criteria (with the textual exemplifications) gradually adapted by the first authors of pastoral literature, such as: Jan Kochanowski, Szymon Szymonowic and the Zimorowic brothers. The opus of these poets prove the numerous transformations of idyllic criteria in specific textual realizations, showing the complicated path in the location of the idyll in the canon of literature and in Polish culture in general, with its lifespan and topicality at the same time. Key words: idyll, genology, Antiquity, Renaissance, Christianity, Jan Kochanowski, Szymon Szymonowic, Szymon Zimorowic, Józef Bartłomiej Zimorowic