Hasil untuk "German literature"

Menampilkan 20 dari ~8498792 hasil · dari CrossRef, DOAJ, arXiv, Semantic Scholar

JSON API
arXiv Open Access 2026
GerAV: Towards New Heights in German Authorship Verification using Fine-Tuned LLMs on a New Benchmark

Lotta Kiefer, Christoph Leiter, Sotaro Takeshita et al.

Authorship verification (AV) is the task of determining whether two texts were written by the same author and has been studied extensively, predominantly for English data. In contrast, large-scale benchmarks and systematic evaluations for other languages remain scarce. We address this gap by introducing GerAV, a comprehensive benchmark for German AV comprising over 600k labeled text pairs. GerAV is built from Twitter and Reddit data, with the Reddit part further divided into in-domain and cross-domain message-based subsets, as well as a profile-based subset. This design enables controlled analysis of the effects of data source, topical domain, and text length. Using the provided training splits, we conduct a systematic evaluation of strong baselines and state-of-the-art models and find that our best approach, a fine-tuned large language model, outperforms recent baselines by up to 0.09 absolute F1 score and surpasses GPT-5 in a zero-shot setting by 0.08. We further observe a trade-off between specialization and generalization: models trained on specific data types perform best under matching conditions but generalize less well across data regimes, a limitation that can be mitigated by combining training sources. Overall, GerAV provides a challenging and versatile benchmark for advancing research on German and cross-domain AV.

en cs.CL
arXiv Open Access 2026
SteuerLLM: Local specialized large language model for German tax law analysis

Sebastian Wind, Jeta Sopa, Laurin Schmid et al.

Large language models (LLMs) demonstrate strong general reasoning and language understanding, yet their performance degrades in domains governed by strict formal rules, precise terminology, and legally binding structure. Tax law exemplifies these challenges, as correct answers require exact statutory citation, structured legal argumentation, and numerical accuracy under rigid grading schemes. We algorithmically generate SteuerEx, the first open benchmark derived from authentic German university tax law examinations. SteuerEx comprises 115 expert-validated examination questions spanning six core tax law domains and multiple academic levels, and employs a statement-level, partial-credit evaluation framework that closely mirrors real examination practice. We further present SteuerLLM, a domain-adapted LLM for German tax law trained on a large-scale synthetic dataset generated from authentic examination material using a controlled retrieval-augmented pipeline. SteuerLLM (28B parameters) consistently outperforms general-purpose instruction-tuned models of comparable size and, in several cases, substantially larger systems, demonstrating that domain-specific data and architectural adaptation are more decisive than parameter scale for performance on realistic legal reasoning tasks. All benchmark data, training datasets, model weights, and evaluation code are released openly to support reproducible research in domain-specific legal artificial intelligence. A web-based demo of SteuerLLM is available at https://steuerllm.i5.ai.fau.de.

en cs.CL, cs.AI
DOAJ Open Access 2025
Lärdom och läsande kring sekelskiftet 1800

Peter Josephson

During the latter half of the 18th century, the doors of German universities were opened from within. Scholars abandoned Latin in favor of the vernacular and turned outward to a broader audience. Philosophers and scientists saw in the printing press a tool that would help eradicate ignorance and superstition, ultimately laying the groundwork for a better society. However, the enthusiasm was short-lived. Influential scholars increasingly lamented by the turn of the 19th century that competition for readers' attention had compromised the quality of science. Many described a situation of intellectual undercutting and warned that popular science was displacing the genuine. In "Hermeneutics for the People!", I examine the late 18th and early 19th-century German-language debate on the increasing commercialization of academic knowledge. Specifically, I analyze how one sought to stimulate demand for serious and educational literature by fostering people into good and quality-conscious readers.

History of scholarship and learning. The humanities
DOAJ Open Access 2025
Lin Jaldati et le chant yiddish en RDA

Laurence GUILLON

This article focuses on the singer Lin Jaldati (1912-1988), a Jewish communist and concentration camp survivor, who moved to the GDR in 1952 with her husband, the pianist Eberhard Rebling. It is not easy to see her art as belonging to any form of avant-garde, given that she was part of the cultural elite for more than three decades in the GDR and that her work consisted of reviving a past repertoire in danger of disappearing, according to the programmatic formula “Doss lid is geblibn”. Yet the context in which this undertaking took place makes it a notable exception, which could be labelled as “avant-garde”, given the almost total annihilation of Jewish life and Yiddish culture in the GDR, as well as its attempt to combine Yiddish tradition with contemporary influences – a paradoxical avant-garde, in short, which an analysis through the prism of gender makes even more interesting.

German literature
DOAJ Open Access 2025
Magic in Old Norse-Icelandic literature: a typology of modes

Stephen A. Mitchell

Magic often plays a significant role in medieval European narratives, where it can be used in a variety of ways, including as a literary tool. In this essay, I briefly consider magic as a narrative device and propose a typology of modes of presentation (general, detailed, and explicit), and argue that Old Norse-Icelandic literature appears to engage in an especially wide array of narrative presentations of magic, particularly when contrasted with comparable materials from elsewhere in northern Europe.

German literature, Philology. Linguistics
DOAJ Open Access 2025
Deutsche Sprache im Großen Gymnasium der multikulturellen Stadt Osijek als Träger der mitteleuropäischen Kulturwerte – historischer Ansatz

Ljubica Kordić

Mehrsprachigkeit und Multikulturalität sind untrennbare Bestandteile der Identität der Stadt Osijek, in welcher die deutsche Sprache eine eigentümliche Rolle ausgeübt hat. In der Einleitung dieser Arbeit wird das multikulturelle und mehrsprachige Milieu der Stadt Osijek vom 17. bis zum 20. Jahrhundert als Grundlage der Entwicklung des essekerischen Dialekts dargestellt. Im Mittelpunkt der Untersuchung steht die deutsche Sprache als einer der Träger der mitteleuropäischen Werte, Kultur und Weltanschauung. Jene Werte haben Lebensweise der einheimischen Bevölkerung und Schwung im kulturellen und wirtschaftlichen Leben der Stadt im zweiten Teil des 19. und anfangs des 20. Jh. intensiv beeinflusst. Obwohl um die Mitte des 19. Jahrhunderts Kroatisch zur offiziellen Sprache erklärt wurde, war Deutsch im mehrsprachigen Milieu von Osijek in den ersten Jahrzehnten des 20. Jh. noch immer die gesprochene Stadtsprache. In seinem Klassischen (Großen) Gymnasium galt Deutsch lange Zeit als Unterrichtssprache, und, abhängig von der Muttersprache seiner Schüler, war es ihre Erst- oder Zweitsprache. Auf Grund der im Museum der Stadt Osijek und im Staatsarchiv im Fundus des klassischen Gymnasiums aufbewahrten Schuldokumentation werden die Lehrinhalte und die obligatorische Lektüre im Fach „Deutsche Sprache“ an diesem Gymnasium erforscht und analysiert. Das Ziel dieser Untersuchung ist es festzustellen, welche Werte und Weltanschauungen den Schülern durch den Deutschunterricht vermittelt wurden. Eine interessante Quelle der Informationen über die Werte und Erziehungsziele, die damals im klassischen Gymnasium von Osijek gefördert wurden, stellen die Themen der obligatorischen Aufsätze in deutscher Sprache dar, deren Listen in regelmäßiägen jährlichen Schulberichten des Gymnasiums zu finden sind. Im abschließenden Teil der Arbeit werden auch einige negative Züge des Lebens in einer multikulturellen Stadt erörtert, wo in verschiedenen sozialpolitischen Umständen um Übermacht gekämpft wurde.

German literature, Philology. Linguistics
arXiv Open Access 2025
On the Overestimation of Efficiency in Relativistic Electron Scattering

Grant Brassem, Christian Viernes, German Sciaini

Recent reviews in ultrafast electron diffraction (UED) have claimed that relativistic electrons exhibit enhanced elastic scattering efficiency, frequently quantified as a γ^2 increase in the differential cross section. These claims, however, originate from angular-domain analyses that overlook the compression of scattering angles θwith increasing electron energy, leading to an apparent, but artificial, enhancement. In this work, we recast the problem in momentum-transfer space q, where scattering is accurately accounted for. This transformation eliminates the angular compression artefact and reveals that high-energy scaling follows a simple β^{-2} dependence, with no intrinsic relativistic gain. We demonstrate this by directly integrating relativistic differential elastic-scattering cross sections from ELSEPA and by applying a straightforward transformation of the well-known Mott-Massey formalism into q-space. The results are general, with calculations performed for elements from carbon to gold and for energies between 50 keV and 5000 keV. They reproduce the long-established trend in total elastic scattering cross sections, in which scattering strength decreases with increasing electron kinetic energy. Practically, at energies above roughly 50 keV, scattering is already dominated by the forward direction, and most of the scattered intensity falls within the acceptance range of typical UED detectors. These findings correct a widespread misconception in the UED literature and provide a more accurate and intuitive framework for interpreting and optimizing high-energy electron scattering experiments.

en physics.acc-ph
arXiv Open Access 2025
Unsupervised Classification of English Words Based on Phonological Information: Discovery of Germanic and Latinate Clusters

Takashi Morita, Timothy J. O'Donnell

Cross-linguistically, native words and loanwords follow different phonological rules. In English, for example, words of Germanic and Latinate origin exhibit different stress patterns, and a certain syntactic structure, double-object datives, is predominantly associated with Germanic verbs rather than Latinate verbs. From the perspective of language acquisition, however, such etymology-based generalizations raise learnability concerns, since the historical origins of words are presumably inaccessible information for general language learners. In this study, we present computational evidence indicating that the Germanic-Latinate distinction in the English lexicon is learnable from the phonotactic information of individual words. Specifically, we performed an unsupervised clustering on corpus-extracted words, and the resulting word clusters largely aligned with the etymological distinction. The model-discovered clusters also recovered various linguistic generalizations documented in the previous literature regarding the corresponding etymological classes. Moreover, our model also uncovered previously unrecognized features of the quasi-etymological clusters. Taken together with prior results from Japanese, our findings indicate that the proposed method provides a general, cross-linguistic approach to discovering etymological structure from phonotactic cues in the lexicon.

en cs.CL
arXiv Open Access 2025
Classifying German Language Proficiency Levels Using Large Language Models

Elias-Leander Ahlers, Witold Brunsmann, Malte Schilling

Assessing language proficiency is essential for education, as it enables instruction tailored to learners needs. This paper investigates the use of Large Language Models (LLMs) for automatically classifying German texts according to the Common European Framework of Reference for Languages (CEFR) into different proficiency levels. To support robust training and evaluation, we construct a diverse dataset by combining multiple existing CEFR-annotated corpora with synthetic data. We then evaluate prompt-engineering strategies, fine-tuning of a LLaMA-3-8B-Instruct model and a probing-based approach that utilizes the internal neural state of the LLM for classification. Our results show a consistent performance improvement over prior methods, highlighting the potential of LLMs for reliable and scalable CEFR classification.

en cs.CL, cs.AI
arXiv Open Access 2024
The German Tank Problem with Multiple Factories

Steven J. Miller, Kishan Sharma, Andrew K. Yang

During the Second World War, estimates of the number of tanks deployed by Germany were critically needed. The Allies adopted a successful statistical approach to estimate this information: assume that the tanks are sequentially numbered starting from, say, 1, and ending at an unknown positive integer $N$. If we observe the numbers of $k$ tanks, then the best linear unbiased estimator for $N$ is $M(1+1/k)-1$ where $M$ is the maximum observed serial number. While this approach was successful, there are many more adversarial situations where the approach for the original German Tank Problem falls short. Typically the number of ``factories'' is a possibly unknown $l>1$, and tanks produced by different factories may have serial numbers in disjoint ranges that are often separated by unknown amounts. Clark, Gonye and Miller (CGM) presented an unbiased estimator for $N$ when the minimum serial number is unknown. So if one can identify which samples correspond to which factory, one can then estimate each factory's range using CGM's method, and sum them for an estimate of the rival's total productivity. We present a procedure to estimate the total productivity and prove that it is effective when $\log l/\log k$ is sufficiently small. In the final section, we show that if we have a small number of samples, we can make an estimator that performs orders of magnitude better when given additional information about the size of the gaps.

en math.ST
arXiv Open Access 2024
Shifting social norms as a driving force for linguistic change: Struggles about language and gender in the German Bundestag

Carolin Müller-Spitzer, Samira Ochs

This paper focuses on language change based on shifting social norms, in particular with regard to the debate on language and gender. It is a recurring argument in this debate that language develops "naturally" and that "severe interventions" - such as gender-inclusive language is often claimed to be - in the allegedly "organic" language system are inappropriate and even "dangerous". Such interventions are, however, not unprecedented. Socially motivated processes of language change are neither unusual nor new. We focus in our contribution on one important political-social space in Germany, the German Bundestag. Taking other struggles about language and gender in the plenaries of the Bundestag as a starting point, our article illustrates that language and gender has been a recurring issue in the German Bundestag since the 1980s. We demonstrate how this is reflected in linguistic practices of the Bundestag, by the use of a) designations for gays and lesbians; b) pair forms such as Bürgerinnen und Bürger (female and male citizens); and c) female forms of addresses and personal nouns ('Präsidentin' in addition to 'Präsident'). Lastly, we will discuss implications of these earlier language battles for the currently very heated debate about gender-inclusive language, especially regarding new forms with gender symbols like the asterisk or the colon (Lehrer*innen, Lehrer:innen; male*female teachers) which are intended to encompass all gender identities.

en cs.CL
arXiv Open Access 2023
Exploring the language of the sharing economy: Building trust and reducing privacy concern on Airbnb in German and English

Alex Zarifis, Richard Ingham, Julia Kroenung

The text in the profile of those offering their properties in England in English and in Germany in German, are compared to explore whether trust is built, and privacy concerns are reduced in the same way. Six methods of building trust are used by the landlords: (1) the level of formality, (2) distance and proximity, (3) emotiveness and humor, (4) being assertive and passive aggressive, (5) conformity to the platform language style and terminology and (6) setting boundaries. Privacy concerns are not usually reduced directly as this is left to the platform. The findings indicate that language has a limited influence and the platform norms and habits are the biggest influence.

en cs.HC, cs.CY
arXiv Open Access 2023
Triple-collinear splittings with massive particles

Prasanna K. Dhani, Germán Rodrigo, German F. R. Sborlini

We analyze in detail the most singular behaviour of processes involving triple-collinear splittings with massive particles in the quasi-collinear limit, and present compact expressions for the splitting amplitudes and the corresponding splitting kernels at the squared-amplitude level. Our expressions fully agree with well-known triple-collinear splittings in the massless limit, which are used as a guide to achieve the final expressions. These results are important to quantify dominant mass effects in many observables, and constitute an essential ingredient of current high-precision computational frameworks for collider phenomenology.

arXiv Open Access 2023
On the Impact of Cross-Domain Data on German Language Models

Amin Dada, Aokun Chen, Cheng Peng et al.

Traditionally, large language models have been either trained on general web crawls or domain-specific data. However, recent successes of generative large language models, have shed light on the benefits of cross-domain datasets. To examine the significance of prioritizing data diversity over quality, we present a German dataset comprising texts from five domains, along with another dataset aimed at containing high-quality data. Through training a series of models ranging between 122M and 750M parameters on both datasets, we conduct a comprehensive benchmark on multiple downstream tasks. Our findings demonstrate that the models trained on the cross-domain dataset outperform those trained on quality data alone, leading to improvements up to $4.45\%$ over the previous state-of-the-art. The models are available at https://huggingface.co/ikim-uk-essen

en cs.CL, cs.AI
arXiv Open Access 2022
The Hitchhiker's Guide to Fused Twins: A Review of Access to Digital Twins in situ in Smart Cities

Jascha Grübel, Tyler Thrash, Leonel Aguilar et al.

Smart Cities already surround us, and yet they are still incomprehensibly far from directly impacting everyday life. While current Smart Cities are often inaccessible, the experience of everyday citizens may be enhanced with a combination of the emerging technologies Digital Twins (DTs) and Situated Analytics. DTs represent their Physical Twin (PT) in the real world via models, simulations, (remotely) sensed data, context awareness, and interactions. However, interaction requires appropriate interfaces to address the complexity of the city. Ultimately, leveraging the potential of Smart Cities requires going beyond assembling the DT to be comprehensive and accessible. Situated Analytics allows for the anchoring of city information in its spatial context. We advance the concept of embedding the DT into the PT through Situated Analytics to form Fused Twins (FTs). This fusion allows access to data in the location that it is generated in an embodied context that can make the data more understandable. Prototypes of FTs are rapidly emerging from different domains, but Smart Cities represent the context with the most potential for FTs in the future. This paper reviews DTs, Situated Analytics, and Smart Cities as the foundations of FTs. Regarding DTs, we define five components (Physical, Data, Analytical, Virtual, and Connection environments) that we relate to several cognates (i.e., similar but different terms) from existing literature. Regarding Situated Analytics, we review the effects of user embodiment on cognition and cognitive load. Finally, we classify existing partial examples of FTs from the literature and address their construction from Augmented Reality, Geographic Information Systems, Building/City Information Models, and DTs and provide an overview of future direction

en cs.CY, cs.GL
arXiv Open Access 2022
A Transfer Learning Based Model for Text Readability Assessment in German

Salar Mohtaj, Babak Naderi, Sebastian Möller et al.

Text readability assessment has a wide range of applications for different target people, from language learners to people with disabilities. The fast pace of textual content production on the web makes it impossible to measure text complexity without the benefit of machine learning and natural language processing techniques. Although various research addressed the readability assessment of English text in recent years, there is still room for improvement of the models for other languages. In this paper, we proposed a new model for text complexity assessment for German text based on transfer learning. Our results show that the model outperforms more classical solutions based on linguistic features extraction from input text. The best model is based on the BERT pre-trained language model achieved the Root Mean Square Error (RMSE) of 0.483.

en cs.CL, cs.AI
arXiv Open Access 2022
Generalizing the German Tank Problem

Anthony Lee, Steven J. Miller

The German Tank Problem dates back to World War II when the Allies used a statistical approach to estimate the number of enemy tanks produced or on the field from observed serial numbers after battles. Assuming that the tanks are labeled consecutively starting from 1, if we observe $k$ tanks from a total of $N$ tanks with the maximum observed tank being $m$, then the best estimate for $N$ is $m(1 + 1/k) - 1$. We explore many generalizations. We looked at the discrete and continuous one dimensional case. We explored different estimators such as the $L$\textsuperscript{th} largest tank, and applied motivation from portfolio theory and studied a weighted average; however, the original formula was the best. We generalized the problem in two dimensions, with pairs instead of points, studying the discrete and continuous square and circle variants. There were complications from curvature issues and that not every number is representable as a sum of two squares. We often concentrated on the large $N$ limit. For the discrete and continuous square, we tested various statistics, finding the largest observed component did best; the scaling factor for both cases is $(2k+1)/2k$. The discrete case was especially involved because we had to use approximation formulas that gave us the number of lattice points inside the circle. Interestingly, the scaling factors were different for the cases. Lastly, we generalized the problem into $L$ dimensional squares and circles. The discrete and continuous square proved similar to the two dimensional square problem. However, for the $L$\textsuperscript{th} dimensional circle, we had to use formulas for the volume of the $L$-ball, and had to approximate the number of lattice points inside it. The formulas for the discrete circle were particularly interesting, as there was no $L$ dependence in the formula.

en math.PR, math.ST

Halaman 16 dari 424940