Jörn Brüggemann, Carina Ascherl, Laureen Okesson
et al.
In der Literaturdidaktik wecken Chatbots wie ChatGPT die Erwartung, bei der Bewältigung zentraler Erwerbs- und Vermittlungsherausforderungen im Bereich des literarischen Textverstehens unterstützen zu können. Allerdings ist unklar, inwieweit diese Erwartung berechtigt ist, da empirische Evidenz in Bezug auf Gelingensbedingungen und Grenzen einer KI-gestützten Förderung literarischer Verstehenskompetenz fehlt. Vor diesem Hintergrund werden im vorliegenden Beitrag zunächst ungeprüfte Annahmen herausgearbeitet, die die systematische Förderung literarischer Verstehenskompetenz sowie die Validität und Verlässlichkeit KI-erzeugter Unterstützungsangebote betreffen (1). Im Anschluss wird erläutert, wie diese Annahmen in zwei experimentellen Settings mit ChatGPT (basierend auf den Sprachmodellen GPT-3.5 und -4o) im Rahmen des BMBF-Verbundprojekts „Digitale Souveränität als Ziel wegweisender Lehrer:innenbildung in den Sprachen, Gesellschafts- und Wirtschaftswissenschaften“ (DiSo-SGW) untersucht wurden (2). Auf dieser Basis wird erläutert, wie das Förderpotenzial von ChatGPT in weiteren Studien in DiSo-SGW erforscht wird, um empirisch abgesicherte Erkenntnisse zur Implementation KI-gestützter Fördermaßnahmen zu gewinnen.
Abstract (english): Conditions for Success and Limitations of AI-supported Promotion of Literary Literacy. Findings from two Experimental Studies
In the realm of German didactics, the chatbot ChatGPT has raised expectations that AI-generated support can help to better overcome key challenges in the acquisition and teaching of literary literacy. However, it is currently uncertain how realistic these expectations are. There is a lack of empirical evidence regarding the conditions for success and the limitations of AI-supported promotion of literary literacy. Against this background, this paper first identifies unexamined assumptions regarding the systematic promotion of literary literacy and the validity and reliability of AI-generated support services (1). It then explains how these assumptions were investigated in two experimental settings with ChatGPT (based on the GPT-3.5 and -4o LLMs) as part of the BMBF collaborative project “Digital Sovereignty as a Goal of Forwardthinking Professional Development for Teachers in Languages, Social Ciences, and Economics“ (DiSo-SGW) (2). On this basis, we explain how the potential of ChatGPT for promoting literacy literacy is currently being investigated in an intervention study in DiSo-SGW in order to gain empirically validated insights into the implementation of AI-supported support measures.
Large language models (LLMs) have become an essential tool for natural language processing and artificial intelligence in general. Current open-source models are primarily trained on English texts, resulting in poorer performance on less-resourced languages and cultures. We present a set of methodological approaches necessary for the successful adaptation of an LLM to a less-resourced language, and demonstrate them using the Slovene language. We present GaMS3-12B, a generative model for Slovene with 12 billion parameters, and demonstrate that it is the best-performing open-source model for Slovene within its parameter range. We adapted the model to the Slovene language using three-stage continual pre-training of the Gemma 3 model, followed by two-stage supervised fine-tuning (SFT). We trained the model on a combination of 140B Slovene, English, Bosnian, Serbian, and Croatian pretraining tokens, and over 200 thousand English and Slovene SFT examples. We evaluate GaMS3-12B on the Slovenian-LLM-Eval datasets, English-to-Slovene translation, and the Slovene LLM arena. We show that the described model outperforms 12B Gemma 3 across all three scenarios and performs comparably to much larger commercial GPT-4o in the Slovene LLM arena, achieving a win rate of over 60 %.
Der Artikel untersucht das Computerspiel Every day the same dream auf seine Potentiale für einen medienreflexiven Sprach- und Literaturunterricht. Das kostenloses Mini Art Game wurde 2009 von Paolo Pedercini entwickelt und handelt von einem Büroangestellten, dessen Leben sich in einer endlosen Routine aus Arbeit und Monotonie abspielt. Der Spieler oder die Spielerin versucht, aus dieser endlosen Schleife auszubrechen. Dabei kommt es zu irritierenden Unbestimmtheitserfahrungen, die zu einer Interpretation herausfordern.
Every day the same dream erweist sich als ideales Beispiel für die Auseinandersetzung mit Computerspielen im Deutschunterricht. Es bietet die Möglichkeit, medien-spezifische Kompetenzen aufzubauen, paratextuelle Formate wie Let ́s Plays zu analysieren, kreative Anschlusskommunikation in Form der gamebasierten Miterzählung zu fördern und Reinszenierungen am Beispiel zweier Kurzfilme aus Österreich und Deutschland zu reflektieren. Schüler/-innen können für eine Interpretation gewinnbringend auf Brechts Theorie und Praxis des epischen Theaters zurückgreifen. Das Spiel regt zur kritischen Betrachtung von gesellschaftlichen Themen an, insbesondere im Zusammenhang mit Arbeit und Identität, und fordert dazu heraus, über Lösungen nachzudenken, die außerhalb des Spiels liegen und politische Fragen aufwerfen.
Abstract (english): This is our life, this is my life too. Linguistic-literacy learning with the digital game "Every day the same dream"
The article observes the video game Every day the same dream for its potential for media-reflective language and literature lessons. The free mini-art game was developed by Paolo Pedercini in 2009 and is about an office worker whose life takes place in an endless routine of work and monotony. The player tries to break out of this endless loop. This leads to irritating experiences of indeterminacy that challenge interpretation.
Every day the same dream proves to be an ideal example for dealing with computer games in German lessons. It offers the opportunity to build up media-specific skills, analyze paratextual formats such as Let’s Plays, promote creative follow-up communication in the form of game-based co-narration and reflect on adaptions using the example of two short films from Austria and Germany. Students can profitably draw on Brecht’s theory and practice of epic theatre for an interpretation. The game encourages critical reflection on social issues, particularly in connection with work and identity, and challenges the players to think about solutions that lie outside the play and to raise political questions.
Coding is a fundamental skill required in the engineering discipline, and much work exists exploring better ways of teaching coding in the higher education context. In particular, Code Snippets (CSs) are approved to be an effective way of introducing programming language units to students. CSs are portions of source code of varying size and content. They can be used in a myriad of ways, one of which is to teach the code they contain as well as its function. To further explore the use of CSs, a pedagogical summer internship project was set up at the Warwick Manufacturing Group (WMG). The scope of the considerations for the study derives from an educational standpoint. Within the evaluations made, the focus was primarily given to pieces of information which proved to provide evidence pertaining to the methodology involved in either teaching or developing teaching materials. By taking the results produced into account from a pedagogical perspective, it was found that several qualities of popular code snippet tutorials which benefit or hinder the learning process, including code length, interactivity, further support, and quality of explanation. These qualities are then combined and used to present a plan for the design of an effective learning resource which makes use of code snippets.
Machine Translation has made impressive progress in recent years offering close to human-level performance on many languages, but studies have primarily focused on high-resource languages with broad online presence and resources. With the help of growing Large Language Models, more and more low-resource languages achieve better results through the presence of other languages. However, studies have shown that not all low-resource languages can benefit from multilingual systems, especially those with insufficient training and evaluation data. In this paper, we revisit state-of-the-art Neural Machine Translation techniques to develop automatic translation systems between German and Bavarian. We investigate conditions of low-resource languages such as data scarcity and parameter sensitivity and focus on refined solutions that combat low-resource difficulties and creative solutions such as harnessing language similarity. Our experiment entails applying Back-translation and Transfer Learning to automatically generate more training data and achieve higher translation performance. We demonstrate noisiness in the data and present our approach to carry out text preprocessing extensively. Evaluation was conducted using combined metrics: BLEU, chrF and TER. Statistical significance results with Bonferroni correction show surprisingly high baseline systems, and that Back-translation leads to significant improvement. Furthermore, we present a qualitative analysis of translation errors and system limitations.
Jafar Isbarov, Kavsar Huseynova, Elvin Mammadov
et al.
The emergence of multilingual large language models has enabled the development of language understanding and generation systems in Azerbaijani. However, most of the production-grade systems rely on cloud solutions, such as GPT-4. While there have been several attempts to develop open foundation models for Azerbaijani, these works have not found their way into common use due to a lack of systemic benchmarking. This paper encompasses several lines of work that promote open-source foundation models for Azerbaijani. We introduce (1) a large text corpus for Azerbaijani, (2) a family of encoder-only language models trained on this dataset, (3) labeled datasets for evaluating these models, and (4) extensive evaluation that covers all major open-source models with Azerbaijani support.
Sina Bagheri Nezhad, Ameeta Agrawal, Rhitabrat Pokharel
Multilingual language models (MLLMs) are crucial for handling text across various languages, yet they often show performance disparities due to differences in resource availability and linguistic characteristics. While the impact of pre-train data percentage and model size on performance is well-known, our study reveals additional critical factors that significantly influence MLLM effectiveness. Analyzing a wide range of features, including geographical, linguistic, and resource-related aspects, we focus on the SIB-200 dataset for classification and the Flores-200 dataset for machine translation, using regression models and SHAP values across 204 languages. Our findings identify token similarity and country similarity as pivotal factors, alongside pre-train data and model size, in enhancing model performance. Token similarity facilitates cross-lingual transfer, while country similarity highlights the importance of shared cultural and linguistic contexts. These insights offer valuable guidance for developing more equitable and effective multilingual language models, particularly for underrepresented languages.
The topic of court criticism coupled with severe warnings about the dangers of a royal dictator or tyrant was well represented in medieval and early modern literature. Despite our common assumptions about the harmony and idyllic nature of King Arthur’s court and the knights of the Round Table, a closer analysis quickly reveals the horrendous problems vexing medieval society (and our own, perhaps). However, medieval poets were careful not to take off their masks when they depicted evil rulers because they normally depended on their patrons. Nevertheless, the criticism of the evil ruler, and then especially of the criminally minded royal councilor (such as in the much later case of Iago in Shakespeare’s Othello) finds vivid expression in more medieval texts than we might have assumed. After a survey of dramatic cases from pre-modern literature as a basis for the subsequent analysis, this article focuses on the Middle High German version of the Old French Roman de Renart by Heinrich der Glîchezâre (late twelfth century) where the protagonist, the fox Reinhart, operates with astounding intellectual acumen and sophistication to deceive, betray, hurt, and even get his opponents killed without any bad conscience.
German literature, Germanic languages. Scandinavian languages
Der vorliegende Beitrag geht von der These aus, dass wir in einer digitalen Gesellschaft leben, die unsere Verhaltens- und Interaktionsweisen, aber auch die Rezeption und Produktion von Kulturerzeugnissen zutiefst prägt. Um diese Gesellschaft mit ihrer Kultur besser zu verstehen und sich in ihr orientieren zu können, ist eine „post-digitale“ Perspektive hilfreich, deren Spezifika im Folgenden erläutert werden. Damit einher geht nicht nur eine Zusammenführung des Medien- und Literaturbegriffs, sondern auch eine möglicherweise grundlegende Veränderung literaturdidaktischer Arbeitsfelder. Exemplarisch wird dies mit einer ökonomiekritischen Schwerpunktsetzung für den Literaturunterricht vorgeführt.
Abstract (english): How to Read Digitisation? Post-Digital Literature Teaching in the Secondary Schools
This article is based on the thesis that we live in a digital society that profoundly shapes our modes of behavior and interaction, but also the reception and production of cultural products. In order to better understand this society with its culture and to be able to orient oneself in it, a „post-digital“ perspective is helpful, which will be ex-plained in the following. This is accompanied not only by a merging of the concepts of media and literature, but also by a possibly fundamental change in the intersection of literature and didactics. This is exemplified by an ecomomy-critical focus on pro-duction and reception of cultural products.
This paper analyzes the translation of Turkish loanwords (i.e., Turkisms) from literary works in Bosnian to German. The contemporary Bosnian language is rich in Turkisms that were mostly adopted during the Ottoman reign. The corpus for the analysis was selected from the novel Grozdanin kikot [Grozdana’s Giggle] by Bosnian writer Hamza Humo (1927). The German translation of the novel title was not a literaltranslation of the original, as the translator Manfred Jähnichen (1958) instead opted for the title Trunkener Sommer [The Drunken Summer], and the novel was a success in German-speaking regions, with many positive reviews subsequently being published in newspapers. The Turkisms in this novel mostly refer to objects and persons, and some of the lexemes have been labelled as archaic, while others are still in everyday use. However, all of these lexemes are considered part of the Bosnian and Herzegovinian culture. Translating culture-specific concepts is often a challenging task. Therefore, this paper aims to analyze the translation strategies that were employed when translating Turkisms from Bosnian to German in this novel. The first phase of the analysis identifies all the lexemes and phrases of Turkish origin and accounts for them in monolingual and bilingual dictionaries. Finally, the study compares the findings to their translation equivalents in the novel. A semantic analysis provides answers in terms of whether Turkisms in Bosnian and Turkisms in German represent false cognates. This research also yields a glossary of Turkisms from this novel with their translation equivalents in German.
German literature, Germanic languages. Scandinavian languages
The rely-guarantee approach is a promising way for compositional verification of concurrent reactive systems (CRSs), e.g. concurrent operating systems, interrupt-driven control systems and business process systems. However, specifications using heterogeneous reaction patterns, different abstraction levels, and the complexity of real-world CRSs are still challenging the rely-guarantee approach. This article proposes PiCore, a rely-guarantee reasoning framework for formal specification and verification of CRSs. We design an event specification language supporting complex reaction structures and its rely-guarantee proof system to detach the specification and logic of reactive aspects of CRSs from event behaviours. PiCore parametrizes the language and its rely-guarantee system for event behaviour using a rely-guarantee interface and allows to easily integrate 3rd-party languages via rely-guarantee adapters. By this design, we have successfully integrated two existing languages and their rely-guarantee proof systems without any change of their specification and proofs. PiCore has been applied to two real-world case studies, i.e. formal verification of concurrent memory management in Zephyr RTOS and a verified translation for a standardized Business Process Execution Language (BPEL) to PiCore.
In this paper, we present an approach for translating word embeddings from a majority language into 4 minority languages: Erzya, Moksha, Udmurt and Komi-Zyrian. Furthermore, we align these word embeddings and present a novel neural network model that is trained on English data to conduct sentiment analysis and then applied on endangered language data through the aligned word embeddings. To test our model, we annotated a small sentiment analysis corpus for the 4 endangered languages and Finnish. Our method reached at least 56\% accuracy for each endangered language. The models and the sentiment corpus will be released together with this paper. Our research shows that state-of-the-art neural models can be used with endangered languages with the only requirement being a dictionary between the endangered language and a majority language.
There are two future time reference auxiliaries in Afrikaans, sal ‘will’ and gaan ‘go’. These auxiliaries are interchangeable in many contexts. In light of the ongoing grammaticalization of gaan, it is pertinent to describe the alternation between sal and gaan in different Afrikaans registers, and contextualize it in the West-Germanic language family where English and Dutch have similar alternating constructions. This is accomplished by analyzing Afrikaans corpus data from the 1970s and the 2000s, both spoken and written. Normalized frequencies and relative frequencies for the use of sal and gaan are reported according to a number of variables, including time, register, lexical verb, syntactic subject, clause type, sentence type, and future proximity. The effect of sentence type and future proximity is consistently present in all the datasets, and a possible change is detected in the effect of subject and clause type. Compared with English and Dutch, Afrikaans future alternation patterns more like that of English, even though it is more closely related to Dutch.
Connectors are linguistic elements that link statements between textual units (Duden, 2016, p. 1083) and function as conjunctions (Heringer, 1989, p. 353). Many scholars have studied English as a Foreign Language (EFL) learners’ connector usage, but few studies have occurred in fields related to German, with deficiencies still present regarding issues such as the relatively limited classification of connectors, insufficient analytical dimensions, and lack of studies on learners at different levels. This study is based on the dynamic systems theory and uses 168 argumentative essays as its data basis, with 155 having been selected from Lerner Corpus and covering four learning stages and the rest from the FalkoEssayL1 corpus. The study uses frequency, diversity, and accuracy as three variables in the quantitative and qualitative analyses of Chinese German as a Foreign Language (GFL) learners’ acquisition of connectors. The results indicate Chinese GFL-learners’ acquisition of connectors to show a non-unidirectional, non-linear, and interactive developmental tendency. Their usage of connectors also shows certain characteristics compared to native German speakers. In addition, the phenomena of lexical plateau and fossilization are also observable among GFL-learners during certain specific learning stages. Accordingly, this study verifies and even enriches the dynamic systems theory with specific lexical data. Based on the obtained results, the influencing factors on the usage of connectors is attributable to four aspects: language input, linguistic interference, learning strategies, and thought patterns. The final section of the article discusses didactic suggestions based on these results.
German literature, Germanic languages. Scandinavian languages
The augmented version of C programming language is presented. The language was completed with a series of low-level and high-level facilities to enlarge the language usage spectrum to various computing systems, operations, users. The ambiguities and inconsistencies have been resolved by managing problematic and undefined languages elements through an interpretation and management similar to that used in the case of other C syntax based languages. The proposed augmentative completeness elements, through @C approach, preserve the spirit of C language and its basic characteristics through compatibility with the standard version but also allow rejuvenation and bring C language to the present programming languages state of the art.
In this paper we study the behaviour of conjugacy languages in virtual graph products, extending results by Ciobanu, Hermiller, Holt and Rees. We focus primarily on virtual graph products in the form of a semi-direct product. First, we study the behaviour of twisted conjugacy representatives in right-angled Artin and Coxeter groups. We prove regularity of the conjugacy geodesic language for virtual graph products in certain cases, and highlight properties of the spherical conjugacy language, depending on the automorphism and ordering on the generating set. Finally, we give a criterion for when the spherical conjugacy language is not unambiguous context-free for virtual graph products. We can extend this further in the case of virtual RAAGs, to show the spherical conjugacy language is not context-free.
Accuracy of English-language Question Answering (QA) systems has improved significantly in recent years with the advent of Transformer-based models (e.g., BERT). These models are pre-trained in a self-supervised fashion with a large English text corpus and further fine-tuned with a massive English QA dataset (e.g., SQuAD). However, QA datasets on such a scale are not available for most of the other languages. Multi-lingual BERT-based models (mBERT) are often used to transfer knowledge from high-resource languages to low-resource languages. Since these models are pre-trained with huge text corpora containing multiple languages, they typically learn language-agnostic embeddings for tokens from different languages. However, directly training an mBERT-based QA system for low-resource languages is challenging due to the paucity of training data. In this work, we augment the QA samples of the target language using translation and transliteration into other languages and use the augmented data to fine-tune an mBERT-based QA model, which is already pre-trained in English. Experiments on the Google ChAII dataset show that fine-tuning the mBERT model with translations from the same language family boosts the question-answering performance, whereas the performance degrades in the case of cross-language families. We further show that introducing a contrastive loss between the translated question-context feature pairs during the fine-tuning process, prevents such degradation with cross-lingual family translations and leads to marginal improvement. The code for this work is available at https://github.com/gokulkarthik/mucot.
Constantine Lignos, Nolan Holley, Chester Palen-Michel
et al.
In this position paper, we describe our perspective on how meaningful resources for lower-resourced languages should be developed in connection with the speakers of those languages. We first examine two massively multilingual resources in detail. We explore the contents of the names stored in Wikidata for a few lower-resourced languages and find that many of them are not in fact in the languages they claim to be and require non-trivial effort to correct. We discuss quality issues present in WikiAnn and evaluate whether it is a useful supplement to hand annotated data. We then discuss the importance of creating annotation for lower-resourced languages in a thoughtful and ethical way that includes the languages' speakers as part of the development process. We conclude with recommended guidelines for resource development.