How Do Lexical Senses Correspond Between Spoken German and German Sign Language?
Melis Çelikkol, Wei Zhao
Sign language lexicographers construct bilingual dictionaries by establishing word-to-sign mappings, where polysemous and homonymous words corresponding to different signs across contexts are often underrepresented. A usage-based approach examining how word senses map to signs can identify such novel mappings absent from current dictionaries, enriching lexicographic resources. We address this by analyzing German and German Sign Language (Deutsche Gebärdensprache, DGS), manually annotating 1,404 word use-to-sign ID mappings derived from 32 words from the German Word Usage Graph (D-WUG) and 49 signs from the Digital Dictionary of German Sign Language (DW-DGS). We identify three correspondence types: Type 1 (one-to-many), Type 2 (many-to-one), and Type 3 (one-to-one), plus No Match cases. We evaluate computational methods: Exact Match (EM) and Semantic Similarity (SS) using SBERT embeddings. SS substantially outperforms EM overall 88.52% vs. 71.31%), with dramatic gains for Type 1 (+52.1 pp). Our work establishes the first annotated dataset for cross-modal sense correspondence and reveals which correspondence patterns are computationally identifiable. Our code and dataset are made publicly available.
A Pandemic for the Good of Digital Literacy? An Empirical Investigation of Newly Improved Digital Skills during COVID-19 Lockdowns
German Neubaum, Irene-Angelica Chounta, Eva Gredel
et al.
This research explores whether the rapid digital transformation due to COVID-19 managed to close or exacerbate the digital divide concerning users' digital skills. We conducted a pre-registered survey with N = 1143 German Internet users. Our findings suggest the latter: younger, male, and higher educated users were more likely to improve their digital skills than older, female, and less educated ones. According to their accounts, the pandemic helped Internet users improve their skills in communicating with others by using video conference software and reflecting critically upon information they found online. These improved digital skills exacerbated not only positive (e.g., feeling informed and safe) but also negative (e.g., feeling lonely) effects of digital media use during the pandemic. We discuss this research's theoretical and practical implications regarding the impact of challenges, such as technological disruption and health crises, on humans' digital skills, capabilities, and future potential, focusing on the second-level digital divide.
Voice Adaptation for Swiss German
Samuel Stucki, Jan Deriu, Mark Cieliebak
This work investigates the performance of Voice Adaptation models for Swiss German dialects, i.e., translating Standard German text to Swiss German dialect speech. For this, we preprocess a large dataset of Swiss podcasts, which we automatically transcribe and annotate with dialect classes, yielding approximately 5000 hours of weakly labeled training material. We fine-tune the XTTSv2 model on this dataset and show that it achieves good scores in human and automated evaluations and can correctly render the desired dialect. Our work shows a step towards adapting Voice Cloning technology to underrepresented languages. The resulting model achieves CMOS scores of up to -0.28 and SMOS scores of 3.8.
SwissGPC v1.0 -- The Swiss German Podcasts Corpus
Samuel Stucki, Mark Cieliebak, Jan Deriu
We present SwissGPC v1.0, the first mid-to-large-scale corpus of spontaneous Swiss German speech, developed to support research in ASR, TTS, dialect identification, and related fields. The dataset consists of links to talk shows and podcasts hosted on Schweizer Radio und Fernsehen and YouTube, which contain approximately 5400 hours of raw audio. After segmentation and weak annotation, nearly 5000 hours of speech were retained, covering the seven major Swiss German dialect regions alongside Standard German. We describe the corpus construction methodology, including an automated annotation pipeline, and provide statistics on dialect distribution, token counts, and segmentation characteristics. Unlike existing Swiss German speech corpora, which primarily feature controlled speech, this corpus captures natural, spontaneous conversations, making it a valuable resource for real-world speech applications.
La norma violata. Metafore del corpo in medicina e nella prosa kafkiana
Alessandra Zurolo
The concept of body is closely related to that of illness and its various manifestations, a connection that can be observed from a range of perspectives. The present study draws upon the metaphors of the body found in selected German medical textbooks and aims to compare the image of the healthy and the pathological body in the tradition of German-language medical education with how it is presented in selected writings by the author who perhaps most emblematically – within the sphere of German literature – provided examples of the theme in question: Franz Kafka. The politicisation of the body, its presentation as a space of definition, refusal, violation, renegotiation of the norm (which recalls Foucault’s thought) and the subsequent definition and manifestation of the pathological are indeed revealed in Franz Kafka’s work in an exemplary manner. The study will explore the differences and points of contact between the medical notion of the body and its artistic-literary representation, starting from the metaphor of illness as a violation of the norm, as found in medical textbooks and selected novels. The analysis thus offers a contribution to the definition of the body in both the medical and literary spheres, and, more generally, to the description of the different manifestations, functions and relationships of the use of metaphor for the same concept in different fields of knowledge.
History of Austria. Liechtenstein. Hungary. Czechoslovakia
Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect
Jannis Vamvas, Noëmi Aepli, Rico Sennrich
Creating neural text encoders for written Swiss German is challenging due to a dearth of training data combined with dialectal variation. In this paper, we build on several existing multilingual encoders and adapt them to Swiss German using continued pre-training. Evaluation on three diverse downstream tasks shows that simply adding a Swiss German adapter to a modular encoder achieves 97.5% of fully monolithic adaptation performance. We further find that for the task of retrieving Swiss German sentences given Standard German queries, adapting a character-level model is more effective than the other adaptation strategies. We release our code and the models trained for our experiments at https://github.com/ZurichNLP/swiss-german-text-encoders
Nachruf Frau Prof. Dr. Gisela Klann-Delius
Nadezda Eryıldız
In diesem kurzen Beitrag wird der Name von Frau Prof. Dr. Gisela Klann-Delius geehrt, die ihr Leben der Forschung im Bereich des Spracherwerbs gewidmet hat. Ebenso werden ihre zwei wichtigsten Lehrwerke kurz vorgestellt.
Postmigrantische Gesellschaftsnarrative in der jüngsten deutschtürkischen Literatur im Fokus: Eine Betrachtung von ÖZIRIS Vatermal und ALTINTAŞ’ Im Morgen wächst ein Birnbaum
Simge Yilmaz
Dieser Artikel untersucht das Postmigrantische in der aktuellen deutschtürkischen Literatur und analysiert, wie sich die Auseinandersetzung mit der türkischen Politik durch Figuren und Erzählperspektiven in den Werken Vatermal von NECATI ÖZIRI (2023) und Im Morgen wächst ein Birnbaum von FIKRI ANIL ALTINTAŞ (2023) entfaltet. Die Untersuchung zeigt, dass die Ich-Erzähler durch ihre Beziehungen zu ihren Vätern und die Art und Weise, wie diese Beziehungen die Identität im Einwanderungsland beeinflussen, tiefgehende Einblicke in die postmigrantische Erfahrung bieten. Dabei wird deutlich, wie die literarischen Texte komplexe Aspekte der Identitätsbildung und der postmigrantischen Perspektive reflektieren.
German literature, Philology. Linguistics
Mandatory Nonfinancial Disclosure and Its Consequences on the Sustainability Reporting Quality of Italian and German Companies
Giorgio Mion, C. R. Loza Adaui
Companies disclosing nonfinancial information through sustainability reporting practices provide markets with data on their social, environmental, and governance performance. The quality of sustainability reporting is much discussed in the literature because this quality affects factors such as the credibility of accountability and building stakeholders’ trust in the company. Nonetheless, the concept of quality is multidimensional, and empirical evidence relating to the quality of sustainability reporting presents different findings. Regulations on mandatory nonfinancial disclosure (NFD) open new perspectives for research on sustainability reporting quality (SRQ). This study explored the effect of introducing mandatory NFD on SRQ by focusing on the effects of new legislation (Directive 2014/95/EU) introduced in Italy and Germany. The analysis was conducted through qualitative content analysis of the sustainability reporting practices of Italian and German companies in the top lists of stock exchanges. Sustainability reporting practices of one year before (2016) and one year after (2017) the implementation of Directive 2014/95/EU were compared. The results of 132 observations demonstrated that the quality of sustainability reporting increased after implementation of the law on mandatory NFD. Further, the effect of the law seemed to reduce the differences in SRQ of the two countries before the introduction of mandatory NFD. The results suggested that obligatoriness of NFD affects SRQ together with other relevant determinants focused on by previous research (e.g., company size and industry type).
Evaluasi penerapan model pembelajaran inkuiri terbimbing dalam pembelajaran kimia : Suatu tinjauan sistematis literatur
Ainayya Almira, Anisah Rachmawati, Insi Norma Jelita
et al.
The aim of this research is to provide insight to chemistry education teachers and researchers regarding the effectiveness of the guided inquiry learning model and provide direction for further research in this field. The research method used in this article is Systematic Literature Review (SLR), to help compile and evaluate various research related to the guided inquiry learning model. The instrument used in this research is to present the results of a literature review of various articles discussing the application of this model in chemistry learning by exploring the definition, application, strengths, weaknesses and effectiveness of the guided inquiry learning model in chemistry learning. The research results show that the application of this model can be carried out both in the theoretical and practical aspects of chemistry learning. The advantages of the guided inquiry model involve students actively, increase learning independence, and provide students with the opportunity to discuss and find their own answers. Students who study with this model tend to have higher learning achievements. However, there are also disadvantages, such as the time required to implement this model and obstacles in dealing with students who are not yet familiar with this approach.
Ontologies in Digital Twins: A Systematic Literature Review
Erkan Karabulut, Salvatore F. Pileggi, Paul Groth
et al.
Digital Twins (DT) facilitate monitoring and reasoning processes in cyber-physical systems. They have progressively gained popularity over the past years because of intense research activity and industrial advancements. Cognitive Twins is a novel concept, recently coined to refer to the involvement of Semantic Web technology in DTs. Recent studies address the relevance of ontologies and knowledge graphs in the context of DTs, in terms of knowledge representation, interoperability and automatic reasoning. However, there is no comprehensive analysis of how semantic technologies, and specifically ontologies, are utilized within DTs. This Systematic Literature Review (SLR) is based on the analysis of 82 research articles, that either propose or benefit from ontologies with respect to DT. The paper uses different analysis perspectives, including a structural analysis based on a reference DT architecture, and an application-specific analysis to specifically address the different domains, such as Manufacturing and Infrastructure. The review also identifies open issues and possible research directions on the usage of ontologies and knowledge graphs in DTs.
A New Aligned Simple German Corpus
Vanessa Toborek, Moritz Busch, Malte Boßert
et al.
"Leichte Sprache", the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German -- German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods. We evaluate our alignments based on a manually labelled subset of aligned documents. The quality of our sentence alignments, as measured by F1-score, surpasses previous work. We publish the dataset under CC BY-SA and the accompanying code under MIT license.
German Parliamentary Corpus (GerParCor)
Giuseppe Abrami, Mevlüt Bagci, Leon Hammerla
et al.
Parliamentary debates represent a large and partly unexploited treasure trove of publicly accessible texts. In the German-speaking area, there is a certain deficit of uniformly accessible and annotated corpora covering all German-speaking parliaments at the national and federal level. To address this gap, we introduce the German Parliament Corpus (GerParCor). GerParCor is a genre-specific corpus of (predominantly historical) German-language parliamentary protocols from three centuries and four countries, including state and federal level data. In addition, GerParCor contains conversions of scanned protocols and, in particular, of protocols in Fraktur converted via an OCR process based on Tesseract. All protocols were preprocessed by means of the NLP pipeline of spaCy3 and automatically annotated with metadata regarding their session date. GerParCor is made available in the XMI format of the UIMA project. In this way, GerParCor can be used as a large corpus of historical texts in the field of political communication for various tasks in NLP.
A Bayesian treatment of the German tank problem
Cory M. Simon
The German tank problem has an interesting historical background and is an engaging problem of statistical estimation for the classroom. The objective is to estimate the size of a population of tanks inscribed with sequential serial numbers, from a random sample. In this tutorial article, we outline the Bayesian approach to the German tank problem, (i) whose solution assigns a probability to each tank population size, thereby quantifying uncertainty, and (ii) which provides an opportunity to incorporate prior information and/or beliefs about the tank population size into the solution. We illustrate with an example. Finally, we survey problems in other contexts that resemble the German tank problem.
Advancing Data Justice Research and Practice: An Integrated Literature Review
David Leslie, Michael Katell, Mhairi Aitken
et al.
The Advancing Data Justice Research and Practice (ADJRP) project aims to widen the lens of current thinking around data justice and to provide actionable resources that will help policymakers, practitioners, and impacted communities gain a broader understanding of what equitable, freedom-promoting, and rights-sustaining data collection, governance, and use should look like in increasingly dynamic and global data innovation ecosystems. In this integrated literature review we hope to lay the conceptual groundwork needed to support this aspiration. The introduction motivates the broadening of data justice that is undertaken by the literature review which follows. First, we address how certain limitations of the current study of data justice drive the need for a re-location of data justice research and practice. We map out the strengths and shortcomings of the contemporary state of the art and then elaborate on the challenges faced by our own effort to broaden the data justice perspective in the decolonial context. The body of the literature review covers seven thematic areas. For each theme, the ADJRP team has systematically collected and analysed key texts in order to tell the critical empirical story of how existing social structures and power dynamics present challenges to data justice and related justice fields. In each case, this critical empirical story is also supplemented by the transformational story of how activists, policymakers, and academics are challenging longstanding structures of inequity to advance social justice in data innovation ecosystems and adjacent areas of technological practice.
Dual Control Strategy for Grid-tied Battery Energy Storage Systems to Comply with Emerging Grid Codes and Fault Ride Through Requirements
Maxime Berger, Ilhan Kocar, Evangelos Farantatos
et al.
Battery energy storage systems (BESSs) need to comply with grid code and fault ride through (FRT) requirements during disturbances whether they are in charging or discharging mode. Previous literature has shown that constant charging current control of BESSs in charging mode can prevent BESSs from complying with emerging grid codes such as the German grid code under stringent unbalanced fault conditions. To address this challenge, this paper proposes a new FRT-activated dual control strategy that consists of switching from constant battery current control to constant DC-link voltage control through a positive droop structure. The results show that the strategy ensures proper DC-link voltage and current management as well as adequate control of the positive- and negative-sequence active and reactive currents according to the grid code priority. It is also shown that the proposed FRT control strategy is tolerant to initial operating conditions of BESS plant, grid code requirements, and fault severity.
Production of electric energy or power. Powerplants. Central stations, Renewable energy sources
Emotion Stimulus Detection in German News Headlines
Bao Minh Doan Dang, Laura Oberländer, Roman Klinger
Emotion stimulus extraction is a fine-grained subtask of emotion analysis that focuses on identifying the description of the cause behind an emotion expression from a text passage (e.g., in the sentence "I am happy that I passed my exam" the phrase "passed my exam" corresponds to the stimulus.). Previous work mainly focused on Mandarin and English, with no resources or models for German. We fill this research gap by developing a corpus of 2006 German news headlines annotated with emotions and 811 instances with annotations of stimulus phrases. Given that such corpus creation efforts are time-consuming and expensive, we additionally work on an approach for projecting the existing English GoodNewsEveryone (GNE) corpus to a machine-translated German version. We compare the performance of a conditional random field (CRF) model (trained monolingually on German and cross-lingually via projection) with a multilingual XLM-RoBERTa (XLM-R) model. Our results show that training with the German corpus achieves higher F1 scores than projection. Experiments with XLM-R outperform their respective CRF counterparts.
Ludwig Winder in der Deutschen Zeitung Bohemia. Prolegomena zu einem tschechoslowakischen Journalisten
Jan Budňák
Two aims are pursued in this paper. On the one hand, a connecting line between the novels and the essay work of Ludwig Winder should be shown here by way of example, based on the journalistic treatment of 'people of will and power' who are also central to Winder's novels. On the other hand, based on selected journalistic articles in the Deutsche Zeitung Bohemia (DZB), Winder is to be anchored in the cultural and political milieu of the first Czechoslovak Republic. Both goals should arrive to one conclusion: For Ludwig Winder, many more cultural-political contexts are relevant, than just the few in which his writing to date, primarily in Jewish and German literature in Prague, has been put. The context the relevance of which for Winder is outlined in this article is the one that unfolds from the position of a journalist from Czechoslovakia who writes in German (taking into account, however, the culture, politics and literature in Czech language too).
Germanic languages. Scandinavian languages, History of Northern Europe. Scandinavia
GGPONC: A Corpus of German Medical Text with Rich Metadata Based on Clinical Practice Guidelines
Florian Borchert, Christina Lohr, Luise Modersohn
et al.
The lack of publicly accessible text corpora is a major obstacle for progress in natural language processing. For medical applications, unfortunately, all language communities other than English are low-resourced. In this work, we present GGPONC (German Guideline Program in Oncology NLP Corpus), a freely distributable German language corpus based on clinical practice guidelines for oncology. This corpus is one of the largest ever built from German medical documents. Unlike clinical documents, clinical guidelines do not contain any patient-related information and can therefore be used without data protection restrictions. Moreover, GGPONC is the first corpus for the German language covering diverse conditions in a large medical subfield and provides a variety of metadata, such as literature references and evidence levels. By applying and evaluating existing medical information extraction pipelines for German text, we are able to draw comparisons for the use of medical language to other corpora, medical and non-medical ones.
COVID-19 Kaggle Literature Organization
Maksim Ekin Eren, Nick Solovyev, Edward Raff
et al.
The world has faced the devastating outbreak of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), or COVID-19, in 2020. Research in the subject matter was fast-tracked to such a point that scientists were struggling to keep up with new findings. With this increase in the scientific literature, there arose a need for organizing those documents. We describe an approach to organize and visualize the scientific literature on or related to COVID-19 using machine learning techniques so that papers on similar topics are grouped together. By doing so, the navigation of topics and related papers is simplified. We implemented this approach using the widely recognized CORD-19 dataset to present a publicly available proof of concept.