Context: Blockchain and AI are increasingly explored to enhance trustworthiness in software engineering (SE), particularly in supporting software evolution tasks. Method: We conducted a systematic literature review (SLR) using a predefined protocol with clear eligibility criteria to ensure transparency, reproducibility, and minimized bias, synthesizing research on blockchain-enabled trust in AI-driven SE tools and processes. Results: Most studies focus on integrating AI in SE, with only 31% explicitly addressing trustworthiness. Our review highlights six recent studies exploring blockchain-based approaches to reinforce reliability, transparency, and accountability in AI-assisted SE tasks. Conclusion: Blockchain enhances trust by ensuring data immutability, model transparency, and lifecycle accountability, including federated learning with blockchain consensus and private data verification. However, inconsistent definitions of trust and limited real-world testing remain major challenges. Future work must develop measurable, reproducible trust frameworks to enable reliable, secure, and compliant AI-driven SE ecosystems, including applications involving large language models.
Frauke Janelt, Johannes Kauffold, Haukur Lindberg Sigmarsson
et al.
The slaughter of pregnant sows remains a relevant concern in modern swine production, with prevalence rates reported from 1.5–13% in Europe. Considering fetal sensitivity during late gestation and legal restrictions on transport and slaughter, reliable assessment of fetal age is of considerable practical, ethical, and legal relevance. In this study, 70 pregnancies from low-prolificacy (purebred German Saddleback) and medium-to-high prolificacy genotypes (purebred German Landrace and Duroc × German Landrace hybrids) were repeatedly examined using transabdominal ultrasonography, with a total of 15 examinations per pregnancy. Seven fetometric parameters—rosto-occipital distance, bi-parietal distance, orbital distance, sternum length, thorax diameter, body diameter, and crown–rump length—were measured in vivo, assessing two fetuses per pregnancy and calculating mean values to account for intra-individual variation. Parameter feasibility varied across gestation: during early gestation (gestation days 38 and 40), orbital distance, sternum length, and crown–rump length could be reliably measured; in mid-gestation, all seven parameters were measurable, whereas in late gestation (from gestational day 87 onward), crown–rump length was no longer measurable, and the remaining six parameters remained assessable for gestational age estimation. Crown–rump length (CRL) increased from a median of 3.2 cm (range 1.9–4.2 cm) at day 38 to 16.3 cm (range 14.0–18.2 cm) at day 77, representing the most practical parameter for determining the stage of gestation. Litter size had no significant effect on fetometric growth, except for a weak correlation with thorax diameter at day 77, and parity showed no measurable influence on any parameter. The results show that fetometric values in modern sow genotypes are smaller than those reported in earlier literature, highlighting the need for updated gestational age assessment. These findings provide practical guidance for gestational age estimation, supporting the enforcement of animal welfare legislation and potentially contributing to a reduction in the slaughter of highly pregnant sows.
We present WinoMTDE, a new gender bias evaluation test set designed to assess occupational stereotyping and underrepresentation in German machine translation (MT) systems. Building on the automatic evaluation method introduced by arXiv:1906.00591v1, we extend the approach to German, a language with grammatical gender. The WinoMTDE dataset comprises 288 German sentences that are balanced in regard to gender, as well as stereotype, which was annotated using German labor statistics. We conduct a large-scale evaluation of five widely used MT systems and a large language model. Our results reveal persistent bias in most models, with the LLM outperforming traditional systems. The dataset and evaluation code are publicly available under https://github.com/michellekappl/mt_gender_german.
Michael Hoffmann, Jophin John, Stefan Schweter
et al.
We present Llama-GENBA-10B, a trilingual foundation model addressing English-centric bias in large language models. Built on Llama 3.1-8B and scaled to 10B parameters, Llama-GENBA-10B is continuously pretrained on 164B tokens (82B English, 82B German, and 80M Bavarian), balancing resources while preventing English dominance. Targeted at the German NLP community, the model also promotes Bavarian as a low-resource language. Development tackled four challenges: (1) curating a multilingual corpus despite Bavarian scarcity, (2) creating a unified tokenizer for English, German, and Bavarian, (3) optimizing architecture and language-ratio hyperparameters for cross-lingual transfer, and (4) establishing the first standardized trilingual evaluation suite by translating German benchmarks into Bavarian. Evaluations show that Llama-GENBA-10B achieves strong cross-lingual performance, with the fine-tuned variant surpassing Apertus-8B-2509 and gemma-2-9b in Bavarian and establishing itself as the best model in its class for this language, while also outperforming EuroLLM in English and matching its results in German. Training on the Cerebras CS-2 demonstrated efficient large-scale multilingual pretraining with documented energy use, offering a blueprint for inclusive foundation models that integrate low-resource languages.
Stefanie Urchs, Veronika Thurner, Matthias Aßenmacher
et al.
Open-access corpora are essential for advancing natural language processing (NLP) and computational social science (CSS). However, large-scale resources for German remain limited, restricting research on linguistic trends and societal issues such as gender bias. We present taz2024full, the largest publicly available corpus of German newspaper articles to date, comprising over 1.8 million texts from taz, spanning 1980 to 2024. As a demonstration of the corpus's utility for bias and discrimination research, we analyse gender representation across four decades of reporting. We find a consistent overrepresentation of men, but also a gradual shift toward more balanced coverage in recent years. Using a scalable, structured analysis pipeline, we provide a foundation for studying actor mentions, sentiment, and linguistic framing in German journalistic texts. The corpus supports a wide range of applications, from diachronic language analysis to critical media studies, and is freely available to foster inclusive and reproducible research in German-language NLP.
Maria Korobeynikova, Alessia Battisti, Lukas Fischer
et al.
Current evaluation of German automatic text simplification (ATS) relies on general-purpose metrics such as SARI, BLEU, and BERTScore, which insufficiently capture simplification quality in terms of simplicity, meaning preservation, and fluency. While specialized metrics like LENS have been developed for English, corresponding efforts for German have lagged behind due to the absence of human-annotated corpora. To close this gap, we introduce DETECT, the first German-specific metric that holistically evaluates ATS quality across all three dimensions of simplicity, meaning preservation, and fluency, and is trained entirely on synthetic large language model (LLM) responses. Our approach adapts the LENS framework to German and extends it with (i) a pipeline for generating synthetic quality scores via LLMs, enabling dataset creation without human annotation, and (ii) an LLM-based refinement step for aligning grading criteria with simplification requirements. To the best of our knowledge, we also construct the largest German human evaluation dataset for text simplification to validate our metric directly. Experimental results show that DETECT achieves substantially higher correlations with human judgments than widely used ATS metrics, with particularly strong gains in meaning preservation and fluency. Beyond ATS, our findings highlight both the potential and the limitations of LLMs for automatic evaluation and provide transferable guidelines for general language accessibility tasks.
The written and oral culture of the Baltic indigenous peoples underwent gradual changes in the late 18th and 19th centuries. According to Wolfgang Welsch, vision is linked with knowledge and science, while hearing relates to faith and religion (Welsch 1996: 248) – this distinction shaped the interaction between oral and written culture. Among Baltic peasants, oral culture remained dominant until the mid-19th century, with the German clergy continuing to control the information space despite ongoing social change. During the Enlightenment, secular Latvian literature began to emerge. Gotthard Friedrich Stender (1714–1796), a German pastor from Kurzeme, laid the foundation for Latvian secular prose, poetry, and popular science literature. However, his songs, the so-called ziņģes, proved more influential than his prose. The songs combine entertainment with moral instruction on drinking, social harmony, and education. Around the turn of the 19th century, major transformations occurred: the territory of present-day Latvia was incorporated into the Russian Empire, Napoleon’s campaigns threatened the region, serfdom was abolished, and a Latvian school network was created. The public demanded information, which was shared through church sermons and, from the 1820s onward, through Latvian newspapers. Supported by Baltic German pastors, the first generation of Latvian intellectuals emerged. By the 1830s, they actively sought to merge oral and written traditions, adapting elements of the Baltic Germans’ peasant Enlightenment project for the purposes of the Latvian national awakening. This paper examines how three key events of the early 19th century – Napoleon’s campaigns and Latvian recruitment into the Russian army, the abolition of serfdom, and the rise of Latvian schools – were reflected in Latvian songs. It analyzes songs published in Latvian newspapers, in books, and on flyers, and it explores the differing perspectives of Baltic Germans and Latvians.
The oldest German charms: issues on textual criticism. Medieval German charms show two sets of problems when dealing with textual criticism: on the one hand, the issue of the charm as a genre and, on the other hand, the complexity of the manuscript transmission. Each critical edition should indeed fit a proper method, which may vary according to the textual genre, the historical period, and the transmission features to get as closer as possible to the original text, even when very little is known about its existence. This paper investigates all the known German charms of the 9th and 10th century: they happen to share important features, such as a manuscript transmission based on codex unicus, the marginal position of the text on the page and in the manuscript itself, the rare paratextual elements and the relationship between Latin and German language within the text. In this period, all charms are deeply rooted in a monastic environment and were not perceived as “magic” since they were written in the same books containing other Christian texts. Indeed, all these features change again if we consider the charms of the following centuries, and then the author of a critical edition must pay attention to other problems, such as, for example, a manuscript tradition based on many variant versions of the same text and also based on increasing contamination of different motifs merging in similar texts.
Whisper is a state-of-the-art automatic speech recognition (ASR) model (Radford et al., 2022). Although Swiss German dialects are allegedly not part of Whisper's training data, preliminary experiments showed that Whisper can transcribe Swiss German quite well, with the output being a speech translation into Standard German. To gain a better understanding of Whisper's performance on Swiss German, we systematically evaluate it using automatic, qualitative, and human evaluation. We test its performance on three existing test sets: SwissDial (Dogan-Schönberger et al., 2021), STT4SG-350 (Plüss et al., 2023), and Swiss Parliaments Corpus (Plüss et al., 2021). In addition, we create a new test set for this work, based on short mock clinical interviews. For automatic evaluation, we used word error rate (WER) and BLEU. In the qualitative analysis, we discuss Whisper's strengths and weaknesses and anylyze some output examples. For the human evaluation, we conducted a survey with 28 participants who were asked to evaluate Whisper's performance. All of our evaluations suggest that Whisper is a viable ASR system for Swiss German, so long as the Standard German output is desired.
Florian Bremm, Patrick Gustav Blaneck, Tobias Bornheim
et al.
Sexism in online media comments is a pervasive challenge that often manifests subtly, complicating moderation efforts as interpretations of what constitutes sexism can vary among individuals. We study monolingual and multilingual open-source text embeddings to reliably detect sexism and misogyny in German-language online comments from an Austrian newspaper. We observed classifiers trained on text embeddings to mimic closely the individual judgements of human annotators. Our method showed robust performance in the GermEval 2024 GerMS-Detect Subtask 1 challenge, achieving an average macro F1 score of 0.597 (4th place, as reported on Codabench). It also accurately predicted the distribution of human annotations in GerMS-Detect Subtask 2, with an average Jensen-Shannon distance of 0.301 (2nd place). The computational efficiency of our approach suggests potential for scalable applications across various languages and linguistic contexts.
In this work, we propose EASSE-multi, a framework for easier automatic sentence evaluation for languages other than English. Compared to the original EASSE framework, EASSE-multi does not focus only on English. It contains tokenizers and versions of text simplification evaluation metrics which are suitable for multiple languages. In this paper, we exemplify the usage of EASSE-multi for German TS, resulting in EASSE-DE. Further, we compare text simplification results when evaluating with different language or tokenization settings of the metrics. Based on this, we formulate recommendations on how to make the evaluation of (German) TS models more transparent and better comparable. The code of EASSE-multi and its German specialisation (EASSE-DE) can be found at https://github.com/rstodden/easse-de.
The topic of mobility contributes in multiple ways to a deeper understanding of cultural history. The degree of mobility has much to say about the development of any society, both in the past and in the present. This paper examines the situation in sixteenth-century Europe through the lens of literary documents in which we can find comments on travel, mobility, and world perspectives. While it might not be possible to identify explicit documents from that period reflecting on mobility itself (technologies, modes of transportation, hospitality, healthcare, finances, etc.), many authors actually included valuable references to this phenomenon, if we only look more closely. The literary narrative thus emerges as an important source of information about social, emotional, economic, religious, and also travel aspects, such as shipping, use of a coach, a horse, or mule, staying in early-modern ‘hotels,’ roads, and bridges. As the analysis will demonstrate, early modern society was highly mobile, with representatives of many different social classes on the move for a wide range of reasons. Whereas the authors consulted here did not specifically signal their interest in reflecting on mobility as such, they commonly reveal that the narrative framework mirrors events on the road, on a ship, or at meetings where many people attended, such as a Church council, an imperial diet, and the like. The need to travel grew tremendously in the sixteenth century, and this for many different reasons. One of the consequences was that poets increasingly engaged with a highly mobile society.
German literature, Germanic languages. Scandinavian languages
Electronic correspondence at a university in an asymmetric relation between student – academic teacher, academic teacher – student very often contains various kinds of requests or inquiries, which are directed by students to academic teachers. Compared to students, lecturers are less likely to make requests to students, but the commands and instructions they give tend to take a polite form, based on the word please suggesting an act of request. The aim of this article is to analyze the polite expressions used by students and academic teachers in e-mails addressed to each other containing requests. The research corpus includes 393 examples of student e-mails to faculty and 405 examples of faculty e-mails to students. The examples come from 224 works by third-year full-time and fourth-year part-time students of various technical faculties of the Kielce University of Technology. The authors of the works analyzed in the text were a total of 400 students (152 women and 248 men). The linguistic material was collected in 2016/2017 and 2017/2018.
The article is devoted to the analysis of the linguistic representation and functional features of the metaphorical complexes death and movement in the German-language novel discourse. The research was based on the novel «Tyll» (2017) written by the modern German writer D. Kehlmann. The novel tells about the character of German medieval legends Till Eulenspiegel portrayed as vagabond and artist, placed in the landscape of the Thirty Years’ War. The relevance of the research is due to the lack of systematic studies of the novel metaphor and its dominant role in the process of meaning generation in literary texts at the «new turn of the century». The author used an integral approach to the problem under study. The initial theoretical premise of the study is the recognition of the serious cognitive potential of metaphor, realized in the process of artistic cognition. As a result of the research, it was found that the metaphorical complex death is embodied by D. Kehlmann in the novel by sequence of figurative rows that create a metaphorical connection between the arrival of the harbinger of death Till Eulenspiegel and the onset of War. The author suggests considering the process of formation of the metaphorical complex movement in a series of text fragments created with the participation of verbs with semantics of movement, spatial prepositions and adverbs of repetition. Attempting to characterize the rhythmic specificity of the novel text under study, the author comes to the conclusion that the main method used by D. Kehlmann is justification of rhythmic expectation. The rhythm of the text is consistent with the conceptual dance of the main character through subjectively ranged gradation rows with an ascending increment of semantic and emotionally expressive significance, which determine the author’s modality. Metaphorical complexes death and movement have prognostic and compositional functions. The results obtained confirm the metaphoricity of the novel under study and the necessity to analyze other metaphorical complexes as part of the central novel metaphor as an important category of modern German literature.
Study Design: Expert opinion. Objectives: Osteoporotic vertebral fractures are of increasing medical importance. For an adequate treatment strategy, an easy and reliable classification is needed. Methods: The working group “Osteoporotic Fractures” of the Spine Section of the German Society for Orthopaedics and Trauma (DGOU) has developed a classification system (OF classification) for osteoporotic thoracolumbar fractures. The consensus decision followed an established pathway including review of the current literature. Results: The OF classification consists of 5 groups: OF 1, no vertebral deformation (vertebral edema); OF 2, deformation with no or minor (1/5) of the posterior wall; OF 4, loss of integrity of the vertebral frame or vertebral body collapse or pincer-type fracture; OF 5, injuries with distraction or rotation. The interobserver reliability was substantial (κ = .63). Conclusions: The proposed OF classification is easy to use and provides superior clinical differentiation of the typical osteoporotic fracture morphologies.
We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swiss German speech, annotated with Standard German text at the sentence level. The data is collected using a web app in which the speakers are shown Standard German sentences, which they translate to Swiss German and record. We make the corpus publicly available. It contains 343 hours of speech from all dialect regions and is the largest public speech corpus for Swiss German to date. Application areas include automatic speech recognition (ASR), text-to-speech, dialect identification, and speaker recognition. Dialect information, age group, and gender of the 316 speakers are provided. Genders are equally represented and the corpus includes speakers of all ages. Roughly the same amount of speech is provided per dialect region, which makes the corpus ideally suited for experiments with speech technology for different dialects. We provide training, validation, and test splits of the data. The test set consists of the same spoken sentences for each dialect region and allows a fair evaluation of the quality of speech technologies in different dialects. We train an ASR model on the training set and achieve an average BLEU score of 74.7 on the test set. The model beats the best published BLEU scores on 2 other Swiss German ASR test sets, demonstrating the quality of the corpus.
In this work, we studied the synthesis of Swiss German speech using different Text-to-Speech (TTS) models. We evaluated the TTS models on three corpora, and we found, that VITS models performed best, hence, using them for further testing. We also introduce a new method to evaluate TTS models by letting the discriminator of a trained vocoder GAN model predict whether a given waveform is human or synthesized. In summary, our best model delivers speech synthesis for different Swiss German dialects with previously unachieved quality.
Thorben Schomacker, Michael Gille, Jörg von der Hülls
et al.
This paper examines the current state-of-the-art of German text simplification, focusing on parallel and monolingual German corpora. It reviews neural language models for simplifying German texts and assesses their suitability for legal texts and accessibility requirements. Our findings highlight the need for additional training data and more appropriate approaches that consider the specific linguistic characteristics of German, as well as the importance of the needs and preferences of target groups with cognitive or language impairments. The authors launched the interdisciplinary OPEN-LS project in April 2023 to address these research gaps. The project aims to develop a framework for text formats tailored to individuals with low literacy levels, integrate legal texts, and enhance comprehensibility for those with linguistic or cognitive impairments. It will also explore cost-effective ways to enhance the data with audience-specific illustrations using image-generating AI. For more and up-to-date information, please visit our project homepage https://open-ls.entavis.com