Hasil "North Germanic. Scandinavian"

CrossRef Open Access 2025

Elda Morlicchio

Abstract Etymological research informs not only our knowledge of particular languages but also the communities of speakers involved and their history more broadly. While it is generally recognized that historical, cultural and societal events can influence the development of languages, how these factors influence linguistic research and how we think about or perceive particular languages is often overlooked. Certain domains or themes seem especially prone to this effect in Germanic studies, and the treatment of the Germanic element in Italo-Romance is a case in point.

en

Detail DOI Sumber

arXiv Open Access 2025

New Encoders for German Trained from Scratch: Comparing ModernGBERT with Converted LLM2Vec Models

Julia Wunderle, Anton Ehrmanntraut, Jan Pfister et al.

Encoders remain essential for efficient German NLP and NLU scenarios despite the rise of decoder-only LLMs. This work studies two routes to high-quality German encoders under identical data and training constraints: 1) training from scratch and 2) converting decoders via LLM2Vec. We introduce two resources: ModernGBERT (134M, 1B), fully transparent German encoders in the ModernBERT style, and LLäMmleinVec (120M, 1B, 7B), decoder-to-encoder conversions trained with masked next-token prediction, both undergoing a context extension to 8.192 tokens. Across SuperGLEBer, ModernGBERT 1B sets a new state of the art (avg 0.808), surpassing GBERT Large (+4%) and the seven-times larger converted 7B model (0.787). On German MTEB after supervised fine-tuning, ModernGBERT 1B (0.551) approaches the converted 7B model (0.557). We release all models, checkpoints, datasets, and full training records, and introduce an encoder-adapted QA-NIAH evaluation. All in all, our results provide actionable guidance: when parameter efficiency and latency matter, from-scratch encoders dominate. When a pre-trained decoder exists and compute is a limited, conversion offers an effective alternative. ModernGBERT and LLäMmleinVec, including all code, data and intermediary checkpoints are published under a research-only RAIL license.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2025

Do Construction Distributions Shape Formal Language Learning In German BabyLMs?

Bastian Bunzeck, Daniel Duran, Sina Zarrieß

We analyze the influence of utterance-level construction distributions in German child-directed/child-available speech on the resulting word-level, syntactic and semantic competence (and their underlying learning trajectories) in small LMs, which we train on a novel collection of developmentally plausible language data for German. We find that trajectories are surprisingly robust for markedly different distributions of constructions in the training data, which have little effect on final accuracies and almost no effect on global learning trajectories. While syntax learning benefits from more complex utterances, word-level learning culminates in better scores with more fragmentary utterances. We argue that LMs trained on developmentally plausible data can contribute to debates on how conducive different kinds of linguistic stimuli are to language learning.

en cs.CL

Detail Sumber

arXiv Open Access 2025

Large Language Model Data Generation for Enhanced Intent Recognition in German Speech

Theresa Pekarek Rosin, Burak Can Kaplan, Stefan Wermter

Intent recognition (IR) for speech commands is essential for artificial intelligence (AI) assistant systems; however, most existing approaches are limited to short commands and are predominantly developed for English. This paper addresses these limitations by focusing on IR from speech by elderly German speakers. We propose a novel approach that combines an adapted Whisper ASR model, fine-tuned on elderly German speech (SVC-de), with Transformer-based language models trained on synthetic text datasets generated by three well-known large language models (LLMs): LeoLM, Llama3, and ChatGPT. To evaluate the robustness of our approach, we generate synthetic speech with a text-to-speech model and conduct extensive cross-dataset testing. Our results show that synthetic LLM-generated data significantly boosts classification performance and robustness to different speaking styles and unseen vocabulary. Notably, we find that LeoLM, a smaller, domain-specific 13B LLM, surpasses the much larger ChatGPT (175B) in dataset quality for German intent recognition. Our approach demonstrates that generative AI can effectively bridge data gaps in low-resource domains. We provide detailed documentation of our data generation and training process to ensure transparency and reproducibility.

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2024

Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding

Ahmad Idrissi-Yaghir, Amin Dada, Henning Schäfer et al.

Recent advances in natural language processing (NLP) can be largely attributed to the advent of pre-trained language models such as BERT and RoBERTa. While these models demonstrate remarkable performance on general datasets, they can struggle in specialized domains such as medicine, where unique domain-specific terminologies, domain-specific abbreviations, and varying document structures are common. This paper explores strategies for adapting these models to domain-specific requirements, primarily through continuous pre-training on domain-specific data. We pre-trained several German medical language models on 2.4B tokens derived from translated public English medical data and 3B tokens of German clinical data. The resulting models were evaluated on various German downstream tasks, including named entity recognition (NER), multi-label classification, and extractive question answering. Our results suggest that models augmented by clinical and translation-based pre-training typically outperform general domain models in medical contexts. We conclude that continuous pre-training has demonstrated the ability to match or even exceed the performance of clinical models trained from scratch. Furthermore, pre-training on clinical data or leveraging translated texts have proven to be reliable methods for domain adaptation in medical NLP tasks.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2024

A 22 percent increase in the German minimum wage: nothing crazy!

Mario Bossler, Lars Chittka, Thorsten Schank

We present the first empirical evidence on the 22 percent increase in the German minimum wage, implemented in 2022, raising it from Euro 9.82 to 10.45 in July and to Euro 12 in October. Leveraging the German Earnings Survey, a large and novel data source comprising around 8 million employee-level observations reported by employers each month, we apply a difference-in-difference-in-differences approach to analyze the policy's impact on hourly wages, monthly earnings, employment, and working hours. Our findings reveal significant positive effects on wages, affirming the policy's intended benefits for low-wage workers. Interestingly, we identify a negative effect on working hours, mainly driven by minijobbers. The hours effect results in an implied labor demand elasticity in terms of the employment volume of -0.2 which only partially offsets the monthly wage gains. We neither observe a negative effect on the individual's employment retention nor the regional employment levels.

en econ.GN

Detail Sumber

arXiv Open Access 2023

On the State of German (Abstractive) Text Summarization

Dennis Aumiller, Jing Fan, Michael Gertz

With recent advancements in the area of Natural Language Processing, the focus is slowly shifting from a purely English-centric view towards more language-specific solutions, including German. Especially practical for businesses to analyze their growing amount of textual data are text summarization systems, which transform long input documents into compressed and more digestible summary texts. In this work, we assess the particular landscape of German abstractive text summarization and investigate the reasons why practically useful solutions for abstractive text summarization are still absent in industry. Our focus is two-fold, analyzing a) training resources, and b) publicly available summarization systems. We are able to show that popular existing datasets exhibit crucial flaws in their assumptions about the original sources, which frequently leads to detrimental effects on system generalization and evaluation biases. We confirm that for the most popular training dataset, MLSUM, over 50% of the training set is unsuitable for abstractive summarization purposes. Furthermore, available systems frequently fail to compare to simple baselines, and ignore more effective and efficient extractive summarization approaches. We attribute poor evaluation quality to a variety of different factors, which are investigated in more detail in this work: A lack of qualitative (and diverse) gold data considered for training, understudied (and untreated) positional biases in some of the existing datasets, and the lack of easily accessible and streamlined pre-processing strategies or analysis tools. We provide a comprehensive assessment of available models on the cleaned datasets, and find that this can lead to a reduction of more than 20 ROUGE-1 points during evaluation. The code for dataset filtering and reproducing results can be found online at https://github.com/dennlinger/summaries

en cs.CL

Detail Sumber

arXiv Open Access 2022

ASR in German: A Detailed Error Analysis

Johannes Wirth, Rene Peinl

The amount of freely available systems for automatic speech recognition (ASR) based on neural networks is growing steadily, with equally increasingly reliable predictions. However, the evaluation of trained models is typically exclusively based on statistical metrics such as WER or CER, which do not provide any insight into the nature or impact of the errors produced when predicting transcripts from speech input. This work presents a selection of ASR model architectures that are pretrained on the German language and evaluates them on a benchmark of diverse test datasets. It identifies cross-architectural prediction errors, classifies those into categories and traces the sources of errors per category back into training data as well as other sources. Finally, it discusses solutions in order to create qualitatively better training datasets and more robust ASR systems.

en cs.CL, cs.AI

Detail Sumber

CrossRef Open Access 2021

Verb-final conjunct clauses in Old English prose

Anna Cichosz

Abstract The aim of this study is to analyse intertextual differences in the use of V-final order in Old English conjunct clauses and to determine to what extent the source of these differences may be Latin influence. The analysis reveals that the frequency of V-final order in OE conjuncts is rather limited in most texts, and Bede’s Historia Ecclesiastica surfaces as the text in which the frequency of V-final conjunct clauses is exceptionally high. The study shows that the regular use of V-final order in Bede may be interpreted as a translation effect, with Latin inflating the frequency of the pattern in conjunct clauses, which means that the frequency of V-final conjunct clauses in early OE translations may not reflect native tendencies.

1 sitasi en

Detail DOI Sumber

arXiv Open Access 2021

A Unified Model for the Fan Region and the North Polar Spur: A bundle of filaments in the Local Galaxy

J. L. West, T. L. Landecker, B. M. Gaensler et al.

We present a simple, unified model that can explain two of the brightest, large-scale, diffuse, polarized radio features in the sky, the North Polar Spur (NPS) and the Fan Region, along with several other prominent loops. We suggest that they are long, magnetized, and parallel filamentary structures that surround the Local arm and/or Local Bubble, in which the Sun is embedded. We show this model is consistent with the large number of observational studies on these regions, and is able to resolve an apparent contradiction in the literature that suggests the high latitude portion of the NPS is nearby, while lower latitude portions are more distant. Understanding the contributions of this local emission is critical to developing a complete model of the Galactic magnetic field. These very nearby structures also provide context to help understand similar non-thermal, filamentary structures that are increasingly being observed with modern radio telescopes.

en astro-ph.GA

Detail DOI Sumber

arXiv Open Access 2020

Long-term Periodicities in North-South Asymmetry of Solar Activity and Alignments of the Giant Planets

J. Javaraiah

The existence of ~12-year and ~51-year periodicities in the north-south asymmetry of solar activity is well known. However, the origin of these as well as the well-known relatively short periodicities in the north-south asymmetry is not yet clear. Here we have analyzed the combined daily data of sunspot groups reported in Greenwich Photoheliographic Results (GPR) and Debrecen Photoheligraphic Data (DPD) during the period 1874-2017 and the data of the orbital positions (ecliptic longitudes) of the giant planets in ten-day intervals during the period 1600-2099. Our analysis suggests that ~12-year and ~51-year periodicities in the north-south asymmetry of solar activity are the manifestations of the differences in the strengths of ~11-year and ~51-year periodicities of activity in the northern- and southern-hemispheres. During the period 1874-2017 the Morlet wavelet power spectra of the north-south asymmetry of sunspot-group area and the mean absolute difference of the orbital positions of the giant planets are found to be similar. Particularly, there is a suggestion that the ~12-year and ~51-year periodicities in the north-south asymmetry of sunspot-group area occurred during approximately the same times as the corresponding periodicities in the mean absolute difference of the orbital positions of the giant planets. Therefore, we suggest that there could be influence of some specific configurations of the giant planets in the origin of the ~12-year and ~50-year periodicities of the north-south asymmetry of solar activity.

en astro-ph.SR

Detail DOI Sumber

arXiv Open Access 2018

Tropical transition of Hurricane Chris (2012) over the North Atlantic Ocean: A multi-scale investigation of predictability

Michael Maier-Gerber, Michael Riemer, Andreas H. Fink et al.

Tropical cyclones that evolve from a non-tropical origin may pose a special challenge for predictions, as they often emerge at the end of a multi-scale cascade of atmospheric processes. Climatological studies have shown that the 'tropical transition' (TT) pathway plays a prominent role in cyclogenesis, in particular over the North Atlantic Ocean. Here we use operational European Centre for Medium-Range Weather Forecasts ensemble predictions to investigate the TT of North Atlantic Hurricane Chris (2012), whose formation was preceded by the merger of two potential vorticity (PV) maxima, eventually resulting in the storm-inducing PV streamer. The principal goal is to elucidate the dynamic and thermodynamic processes governing the predictability of cyclogenesis and subsequent TT. Dynamic time warping is applied to identify ensemble tracks that are similar to the analysis track. This technique permits small temporal and spatial shifts in the development. The formation of the pre-Chris cyclone is predicted by those members that also predict the merging of the two PV maxima. The position of the storm relative to the PV streamer determines whether the pre-Chris cyclone follows the TT pathway. The transitioning storms are located inside a favorable region of high equivalent potential temperatures that result from a warm seclusion underneath the cyclonic roll-up of the PV streamer. A systematic investigation of consecutive ensemble forecasts indicates that forecast improvements are linked to specific events, such as the PV merging. The present case exemplifies how a novel combination of Eulerian and Lagrangian ensemble forecast analysis tool allows to infer physical causes of abrupt changes in predictability.

en physics.ao-ph

Detail DOI Sumber

arXiv Open Access 2015

Some theories beyond the Standard Model

Alan S. Cornell

A brief review on the physics beyond the Standard Model is given, as was presented in the High Energy Particle Physics workshop on the $12^{th}$ of February 2015 at the iThemba North Labs. Particular emphasis is given to the Minimal Supersymmetric Standard Model, with mention of extra-dimensional theories also.

en hep-ph

Detail DOI Sumber

arXiv Open Access 2014

Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) north polar springtime recession mapping: First Three Mars years of observations

Adrian J. Brown, Wendy M. Calvin, Scott L. Murchie

We report on mapping of the north polar region of Mars using data from the Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) instrument. We have observed three Mars Years (28-30) of late-winter and spring recessions (Ls=304°-92°). Our investigations have led to the following observations: 1. We classify the retreat of the north polar seasonal cap into 'pre-sublimation', 'early spring', 'asymmetric' and 'stable' periods according to the prevalent H2O ice grain size distributions. 2. During the early spring, the signatures of CO2 ice at the edge of the cap are obscured by H2O ice, which increases the apparent size of the H2O ice annulus around the seasonal CO2 cap at this time. At around Ls=25°, this process changes into an asymmetrical distribution of H2O deposition, covering CO2 signatures more rapidly in the longitude range from 90-210°E. 3. We detect signatures of 'pure' CO2 ice in extremely limited locations (in Lomonosov Crater) even in mid winter. H2O ice signatures appear everywhere in the retreating CO2 seasonal cap, in contrast with the south polar seasonal cap. 4. We find that average H2O ice grain sizes continuously increase from northern mid-winter to the end of springtime - this is the inverse of the behavior of CO2 ice grain sizes in the southern springtime.

en astro-ph.EP

Detail DOI Sumber

arXiv Open Access 2013

The wobbly Galaxy: kinematics north and south with RAVE red clump giants

M. E. K. Williams, M. Steinmetz, J. Binney et al.

The RAVE survey, combined with proper motions and distance estimates, can be used to study in detail stellar kinematics in the extended solar neighbourhood (solar suburb). Using the red clump, we examine the mean velocity components in 3D between an R of 6 and 10 kpc and a Z of -2 to 2 kpc, concentrating on North-South differences. Simple parametric fits to the R, Z trends for VPHI and the velocity dispersions are presented. We confirm the recently discovered gradient in mean Galactocentric radial velocity, VR, finding that the gradient is more marked below the plane, with a Z gradient also present. The vertical velocity, VZ, also shows clear structure, with indications of a rarefaction-compression pattern, suggestive of wave-like behaviour. We perform a rigorous error analysis, tracing sources of both systematic and random errors. We confirm the North-South differences in VR and VZ along the line-of-sight, with the VR estimated independent of the proper motions. The complex three-dimensional structure of velocity space presents challenges for future modelling of the Galactic disk, with the Galactic bar, spiral arms and excitation of wave-like structures all probably playing a role.

en astro-ph.GA

Detail DOI Sumber

arXiv Open Access 2010

PTF10nvg: An Outbursting Class I Protostar in the Pelican/North American Nebula

Kevin R. Covey, Lynne A. Hillenbrand, Adam A. Miller et al.

During a synoptic survey of the North American Nebula region, the Palomar Transient Factory (PTF) detected an optical outburst (dubbed PTF10nvg) associated with the previously unstudied flat or rising spectrum infrared source IRAS 20496+4354. The PTF R-band light curve reveals that PTF10nvg brightened by more than 5 mag during the current outburst, rising to a peak magnitude of R~13.5 in 2010 Sep. Follow-up observations indicate PTF10nvg has undergone a similar ~5 mag brightening in the K band, and possesses a rich emission-line spectrum, including numerous lines commonly assumed to trace mass accretion and outflows. Many of these lines are blueshifted by ~175 km/s from the North American Nebula's rest velocity, suggesting that PTF10nvg is driving an outflow. Optical spectra of PTF10nvg show several TiO/VO bandheads fully in emission, indicating the presence of an unusual amount of dense (> 10^10 cm^-3), warm (1500-4000 K) circumstellar material. Near-infrared spectra of PTF10nvg appear quite similar to a spectrum of McNeil's Nebula/V1647 Ori, a young star which has undergone several brightenings in recent decades, and 06297+1021W, a Class I protostar with a similarly rich near--infrared emission line spectrum. While further monitoring is required to fully understand this event, we conclude that the brightening of PTF10nvg is indicative of enhanced accretion and outflow in this Class-I-type protostellar object, similar to the behavior of V1647 Ori in 2004-2005.

en astro-ph.SR

Detail DOI Sumber

arXiv Open Access 2009

A Two-Colour CCD Survey of the North Celestial Cap: I. The Method

Evgeny Gorbikov, Noah Brosch, Cristina Afonso

We describe technical aspects of an astrometric and photometric survey of the North Celestial Cap (NCC), from the Pole (DEC=90 deg) to DEC=80 deg, in support of the TAUVEX mission. This region, at galactic latitudes from ~ 17 deg to ~ 37 deg, has poor coverage in modern CCD-based surveys. The observations are performed with the Wise Observatory one-meter reflector and with a new mosaic CCD camera (LAIWO) that images in the Johnson-Cousins R and I bands a one-square-degree field with subarcsec pixels. The images are treated using IRAF and SExtractor to produce a final catalogue of sources. The astrometry, based on the USNO-A2.0 catalogue, is good to ~ 1 arcsec and the photometry is good to ~ 0.1 mag for point sources brighter than R=20.0 or I=19.1 mag. The limiting magnitudes of the survey, defined at photometric errors smaller than 0.15 mag, are 20.6 mag (R) and 19.6 (I). We separate stars from non-stellar objects based on the object shapes in the R and I bands, attempting to reproduce the SDSS star/galaxy dichotomy. The completeness test indicates that the catalogue is complete to the limiting magnitudes.

en astro-ph.IM, astro-ph.GA

Detail DOI Sumber

arXiv Open Access 2008

First Stellar Velocity Dispersion Measurement of a Luminous Quasar Host with Gemini North Laser Guide Star Adaptive Optics

Linda C. Watson, Paul Martini, Kalliopi M. Dasyra et al.

We present the first use of the Gemini North laser guide star adaptive optics (LGS AO) system and an integral field unit (IFU) to measure the stellar velocity dispersion of the host of a luminous quasar. The quasar PG1426+015 (z=0.086) was observed with the Near-Infrared Integral Field Spectrometer (NIFS) on the 8m Gemini North telescope in the H-band as part of the Science Verification phase of the new ALTAIR LGS AO system. The NIFS IFU and LGS AO are well suited for host studies of luminous quasars because one can achieve a large ratio of host to quasar light. We have measured the stellar velocity dispersion of PG1426+015 from 0.1'' to 1'' (0.16 kpc to 1.6 kpc) to be 217+/-15 km/s based on high signal-to-noise ratio measurements of Si I, Mg I, and several CO bandheads. This new measurement is a factor of four more precise than a previous measurement obtained with long-slit spectroscopy and good, natural seeing, yet was obtained with a shorter net integration time. We find that PG1426+015 has a velocity dispersion that places it significantly above the M-sigma relation of quiescent galaxies and lower-luminosity active galactic nuclei with black hole masses estimated from reverberation mapping. We discuss several possible explanations for this discrepancy that could be addressed with similar observations of a larger sample of luminous quasars.

en astro-ph

Detail DOI Sumber

arXiv Open Access 1998

Linear spectropolarimetry of Ap stars: a new degree of constraint on magnetic structure

G. A. Wade, J. -F. Donati, G. Mathys et al.

We present preliminary results from a programme aimed at acquiring linear spectropolarimetry of magnetic A and B stars. Linear polarization in the spectral lines of these objects is due to the Zeeman effect, and should provide detailed new information regarding the structure of their strong magnetic fields. To illustrate the impact of these new data, we compare observed circular and linear polarization line profiles of 53 Cam with the profiles predicted by the magnetic model by Landstreet (1988). Linear polarization in the spectral lines of all stars studied is extremely weak; in most cases, below the threshold of detectability even for very high SNRs. In order to overcome this problem, we employ the Least-Squares Deconvolution (LSD) multi-line analysis technique in order to extract low-noise mean line profiles and polarization signatures from our echelle spectra. Tests show that these mean signatures can be modelled as real spectral lines, and have the potential to lead to high-resolution maps of the magnetic and chemical abundance surface distributions.

en astro-ph

Detail Sumber

arXiv Open Access 2004

W49A North - Global or Local or No Collapse?

John A. Williams, Helene R. Dickel, Lawrence H. Auer

We attempt to fit observations with 5" resolution of the J=2-1 transition of CS in the directions of H II regions A, B, and G of W49A North as well as observations with 20" resolution of the J=2-1, 3-2, 5-4, and 7-6 transitions in the directions of H II regions A and G by using radiative transfer calculations. These calculations predict the intensity profiles resulting from several spherical clouds along the line of sight. We consider three models: global collapse of a very large (5 pc radius) cloud, localized collapse from smaller (1 pc) clouds around individual H II regions, and multiple, static clouds. For all three models we can find combinations of parameters that reproduce the CS profiles reasonably well provided that the component clouds have a core-envelope structure with a temperature gradient. Cores with high temperature and high molecular hydrogen density are needed to match the higher transitions (e.g. J=7-6) observed towards A and G. The lower temperature, low density gas needed to create the inverse P-Cygni profile seen in the CS J=2-1 line (with 5" beam) towards H II region G arises from different components in the 3 models. The infalling envelope of cloud G plus cloud B creates the absorption in global collapse, cloud B is responsible in local collapse, and a separate cloud, G', is needed in the case of many static clouds. The exact nature of the velocity field in the envelopes for the case of local collapse is not important as long as it is in the range of 1 to 5 km/s for a turbulent velocity of about 6 km/s. High resolution observations of the J=1-0 and 5-4 transitions of CS and C34S may distinguish between these three models. Modeling existing observations of HCO+ and C18O does not allow one to distinguish between the three models but does indicate the existence of a bipolar outflow.

en astro-ph

Detail DOI Sumber

Hasil untuk "North Germanic. Scandinavian"