Hasil untuk "North Germanic. Scandinavian"

Menampilkan 20 dari ~949009 hasil · dari arXiv, Semantic Scholar, DOAJ, CrossRef

JSON API
arXiv Open Access 2025
Host galaxy identification of LOFAR sources in the Euclid Deep Field North

Bisigello L., Giulietti M., Prandoni I. et al.

We present a catalogue of optical and near-infrared counterparts to radio sources detected in the Euclid Deep Field North (EDF-N) using observations from the LOw-Frequency ARray (LOFAR) High Band Antenna (HBA) at 144 MHz with 6 arcsec angular resolution. The catalogue covers a circular region of $10deg^2$ and includes 23309 radio sources with a peak signal-to-noise ratio greater than 5. After masking regions close to stars and with unreliable photometry in the optical or near-infrared, the catalogue includes 19550 sources. To carry out a robust identification strategy, we combined the statistical power of the Likelihood Ratio (LR) method, including both colour and magnitude information, with targeted visual inspection. The resulting catalogue boasts a remarkable identification rate of 99.2%, successfully matching 19401 out of 19550 radio sources with reliable optical and/or near-infrared counterparts. For 19391 of the matched sources, we successfully derived photometric redshift for the host galaxy by performing an SED fit using the available data in the optical, near-infrared, far-infrared, and radio. LOFAR sources within the catalogue exhibit a median redshift of 1.1, with some extending up to z=6. Around 7% of the sample is detected only in infrared using IRAC and tends towards higher redshifts, with a median of z=3.0. This comprehensive catalogue serves as a valuable resource for future research, enabling detailed investigations into the properties and evolution of LOFAR-detected sources and their host galaxies.

en astro-ph.GA
arXiv Open Access 2024
Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study

Bolei Ma, Berk Yoztyurk, Anna-Carolina Haensch et al.

In recent research, large language models (LLMs) have been increasingly used to investigate public opinions. This study investigates the algorithmic fidelity of LLMs, i.e., the ability to replicate the socio-cultural context and nuanced opinions of human participants. Using open-ended survey data from the German Longitudinal Election Studies (GLES), we prompt different LLMs to generate synthetic public opinions reflective of German subpopulations by incorporating demographic features into the persona prompts. Our results show that Llama performs better than other LLMs at representing subpopulations, particularly when there is lower opinion diversity within those groups. Our findings further reveal that the LLM performs better for supporters of left-leaning parties like The Greens and The Left compared to other parties, and matches the least with the right-party AfD. Additionally, the inclusion or exclusion of specific variables in the prompts can significantly impact the models' predictions. These findings underscore the importance of aligning LLMs to more effectively model diverse public opinions while minimizing political biases and enhancing robustness in representativeness.

en cs.CL
arXiv Open Access 2024
SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments

Kai-Robin Lange, Carsten Jentsch

The application of natural language processing on political texts as well as speeches has become increasingly relevant in political sciences due to the ability to analyze large text corpora which cannot be read by a single person. But such text corpora often lack critical meta information, detailing for instance the party, age or constituency of the speaker, that can be used to provide an analysis tailored to more fine-grained research questions. To enable researchers to answer such questions with quantitative approaches such as natural language processing, we provide the SpeakGer data set, consisting of German parliament debates from all 16 federal states of Germany as well as the German Bundestag from 1947-2023, split into a total of 10,806,105 speeches. This data set includes rich meta data in form of information on both reactions from the audience towards the speech as well as information about the speaker's party, their age, their constituency and their party's political alignment, which enables a deeper analysis. We further provide three exploratory analyses, detailing topic shares of different parties throughout time, a descriptive analysis of the development of the age of an average speaker as well as a sentiment analysis of speeches of different parties with regards to the COVID-19 pandemic.

en cs.CL
arXiv Open Access 2024
Revisiting the Phenomenon of Syntactic Complexity Convergence on German Dialogue Data

Yu Wang, Hendrik Buschmeier

We revisit the phenomenon of syntactic complexity convergence in conversational interaction, originally found for English dialogue, which has theoretical implication for dialogical concepts such as mutual understanding. We use a modified metric to quantify syntactic complexity based on dependency parsing. The results show that syntactic complexity convergence can be statistically confirmed in one of three selected German datasets that were analysed. Given that the dataset which shows such convergence is much larger than the other two selected datasets, the empirical results indicate a certain degree of linguistic generality of syntactic complexity convergence in conversational interaction. We also found a different type of syntactic complexity convergence in one of the datasets while further investigation is still necessary.

en cs.CL
arXiv Open Access 2023
Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training

Miriam Anschütz, Joshua Oehms, Thomas Wimmer et al.

Automatic text simplification systems help to reduce textual information barriers on the internet. However, for languages other than English, only few parallel data to train these systems exists. We propose a two-step approach to overcome this data scarcity issue. First, we fine-tuned language models on a corpus of German Easy Language, a specific style of German. Then, we used these models as decoders in a sequence-to-sequence simplification task. We show that the language models adapt to the style characteristics of Easy Language and output more accessible texts. Moreover, with the style-specific pre-training, we reduced the number of trainable parameters in text simplification models. Hence, less parallel data is sufficient for training. Our results indicate that pre-training on unaligned data can reduce the required parallel data while improving the performance on downstream tasks.

arXiv Open Access 2023
Preliminary Results of a Scientometric Analysis of the German Information Retrieval Community 2020-2023

Philipp Schaer, Svetlana Myshkina, Jüri Keller

The German Information Retrieval community is located in two different sub-fields: Information and computer science. There are no current studies that investigate these communities on a scientometric level. Available studies only focus on the information scientific part of the community. We generated a data set of 401 recent IR-related publications extracted from six core IR conferences from a mainly computer scientific background. We analyze this data set at the institutional and researcher level. The data set is publicly released, and we also demonstrate a mapping use case.

en cs.IR, cs.DL
arXiv Open Access 2023
On the Impact of Cross-Domain Data on German Language Models

Amin Dada, Aokun Chen, Cheng Peng et al.

Traditionally, large language models have been either trained on general web crawls or domain-specific data. However, recent successes of generative large language models, have shed light on the benefits of cross-domain datasets. To examine the significance of prioritizing data diversity over quality, we present a German dataset comprising texts from five domains, along with another dataset aimed at containing high-quality data. Through training a series of models ranging between 122M and 750M parameters on both datasets, we conduct a comprehensive benchmark on multiple downstream tasks. Our findings demonstrate that the models trained on the cross-domain dataset outperform those trained on quality data alone, leading to improvements up to $4.45\%$ over the previous state-of-the-art. The models are available at https://huggingface.co/ikim-uk-essen

en cs.CL, cs.AI
arXiv Open Access 2023
The Greggs-Pret Index: a Machine Learning analysis of consumer habits as a metric for the socio-economic North-South divide in England

Robin Smith, Kristian C. Z. Haverson

In England, it is anecdotally remarked that the number of Greggs bakeries to be found in a town is a reliable measure of the area's 'Northern-ness'. Conversely, a commercial competitor to Greggs in the baked goods and sandwiches market, Pret-a-Manger, is reputed to be popular in more 'southern' areas of England. Using a Support Vector Machine and an Artificial Neural Network (ANN) Regression Model, the relative geographical distributions of Greggs and Pret have been utilised for the first time to quantify the North-South divide in England. The calculated dividing lines were each compared to another line, based on Gross Domestic Household Income (GDHI). The lines match remarkably well, and we conclude that this is likely because much of England's wealth is concentrated in London, as are most of England's Pret-a-Manger shops. Further studies were conducted based on the relative geographical distributions of popular supermarkets Morrisons and Waitrose, which are also considered to have a North-South association. This analysis yields different results. For all metrics, the North-South dividing line passes close to the M1 Watford Gap services. As a common British idiom, this location is oft quoted as one point along the English North-South divide, and it is notable that this work agrees. This tongue-in-cheek analysis aims to highlight more serious factors highlighting the North-South divide, such as life expectancy, education, and poverty.

en stat.AP
arXiv Open Access 2022
Swiss German Speech to Text system evaluation

Yanick Schraner, Christian Scheller, Michel Plüss et al.

We present an in-depth evaluation of four commercially available Speech-to-Text (STT) systems for Swiss German. The systems are anonymized and referred to as system a-d in this report. We compare the four systems to our STT model, referred to as FHNW from hereon after, and provide details on how we trained our model. To evaluate the models, we use two STT datasets from different domains. The Swiss Parliament Corpus (SPC) test set and a private dataset in the news domain with an even distribution across seven dialect regions. We provide a detailed error analysis to detect the three systems' strengths and weaknesses. This analysis is limited by the characteristics of the two test sets. Our model scored the highest bilingual evaluation understudy (BLEU) on both datasets. On the SPC test set, we obtain a BLEU score of 0.607, whereas the best commercial system reaches a BLEU score of 0.509. On our private test set, we obtain a BLEU score of 0.722 and the best commercial system a BLEU score of 0.568.

en cs.CL, cs.AI
arXiv Open Access 2022
Development of a test in German language to assess middle school students' physics proficiency

Markus Sebastian Feser, Dietmar Höttecke

This short contribution reports the development of a test for assessing middle school students' physics proficiency via multiple-choice single-select items in German language. The test assesses students' content and procedural knowledge across various content areas that are typical of physics education at the middle-school level and is based on adapted items developed within the Third International Mathematics and Science Study (TIMSS). We report the study design we used to develop this test, as well as the results and selected parameters regarding the test's psychometric quality.

en physics.ed-ph
CrossRef Open Access 2021
Familiar vs. unique in a diachronic perspective. Case study of the rise of the definite article in North Germanic

Alicja Piotrowska, Dominika Skrzypek

The aim of the present study is to follow the development of the suffixed definite article in North Germanic, in particular taking into account the unique reference expressed by the nascent article. The study is based on the corpora of Old Swedish, Old Danish and Old Icelandic texts written between 1200 and 1550. Both qualitative and quantitative methods, such as logistic regression models, are applied. The study is grounded in the notions of familiarity and uniqueness, which we explore diachronically. The results indicate that the use of the definite article is much more frequent in familiar than in unique contexts in North Germanic in the periods studied, as a greater proportion of NPs with direct anaphors is definite in the oldest extant texts, as well as throughout the later periods, than the proportion of NPs with unique referents. NPs with unique referents are further shown to constitute a non-uniform group, where the ‘more local’ unique NPs (grounded in specific knowledge) appear more frequently with a definite article than the ‘more global’ unique referents (grounded in encyclopaedic knowledge).

arXiv Open Access 2021
On the role of tournament design in sporting success: A study of the North, Central American and Caribbean qualification for the 2022 FIFA World Cup

László Csató

Playing in the FIFA World Cup finals is an ambition shared by several nations. Since, besides luck and skill, the probability of qualification depends on the design of the qualifiers, the study of these competitions forms an integral part of sports analytics. The Confederation of North, Central America and Caribbean Association Football (CONCACAF) announced a novel qualifying format for the 2022 FIFA World Cup in July 2019. However, the COVID-19 pandemic forced the organisers to return to a more traditional structure. The present chapter analyses how this reform impacted the chances of the national teams to qualify. It is found that the probability of participating in the FIFA World Cup finals can change by more than 5 percentage points under the assumption of fixed strengths for the teams. The idea behind the original design, to divide the contestants into two distinct sets, is worth considering due to the increased competitiveness of the matches played by the strongest and the weakest teams. We recommend mitigating the sharp nonlinearity caused by the seeding policy via a probabilistic rule to the analogy of the NBA draft lottery system.

en physics.soc-ph
arXiv Open Access 2021
GERNERMED -- An Open German Medical NER Model

Johann Frei, Frank Kramer

The current state of adoption of well-structured electronic health records and integration of digital methods for storing medical patient data in structured formats can often considered as inferior compared to the use of traditional, unstructured text based patient data documentation. Data mining in the field of medical data analysis often needs to rely solely on processing of unstructured data to retrieve relevant data. In natural language processing (NLP), statistical models have been shown successful in various tasks like part-of-speech tagging, relation extraction (RE) and named entity recognition (NER). In this work, we present GERNERMED, the first open, neural NLP model for NER tasks dedicated to detect medical entity types in German text data. Here, we avoid the conflicting goals of protection of sensitive patient data from training data extraction and the publication of the statistical model weights by training our model on a custom dataset that was translated from publicly available datasets in foreign language by a pretrained neural machine translation model. The sample code and the statistical model is available at: https://github.com/frankkramer-lab/GERNERMED

en cs.CL, cs.AI
S2 Open Access 2020
Northern Connections: Interregional Contacts in Bronze Age Northern and Middle Sweden

Karin Ojala, Carl-Gösta Ojala

Abstract This article examines northern connections in the Nordic Bronze Age, focusing on interregional contacts in middle and northern Sweden. In the article, we argue that it is important to incorporate a northern perspective in the discussions about the Scandinavian Bronze Age and its networks. We focus on the Mälaren Valley region, especially the province of Uppland, and the northern parts of Sweden, in particular the coastal areas of northern Sweden. We discuss some aspects of the archaeological material, which have been used in earlier discussions of interregional contacts in middle and northern Sweden during the Bronze Age, such as the Håga mound outside of Uppsala, and burial cairns and bronze artefacts in northern Sweden. Furthermore, we discuss eastern contacts with areas in present-day Finland and Russia, and how these have been interpreted in middle and northern Sweden. In our view, there is a need to critically examine interregional contacts and the construction of regional entities and borders in the Bronze Age. In order to better understand the relations between north and south, it is necessary to critically examine the research history behind the present-day conceptions of regions and borders, as well as the political dimensions and power relations involved.

7 sitasi en Geography
arXiv Open Access 2020
Measurement of secondary cosmic-ray neutrons near the geomagnetic North Pole

Richard S. Woolf, Laurel E. Sinclair, Reid A. Van Brabant et al.

The spectrum of cosmogenic neutrons at Earth's surface covers a wide energy range, from thermal to several GeV. The flux of secondary neutrons varies with latitude, elevation, solar activity, and nearby material, including ground moisture. We report the results from a campaign to measure count rates in neutron detectors responding to three different energy ranges conducted near the geomagnetic North Pole at CFS Alert, Nunavut, Canada (82.5 degrees , 62.5 degrees W; vertical geomagnetic cutoff rigidity, RC = 0 GV) in June of 2016. In November 2016, we performed a follow-on measurement campaign in southern Canada at similar RC (1.5 GV) and elevations. We conducted these measurements, at varying elevation and ground moisture content, with unmoderated and moderated 3He detectors for thermal and epithermal-to-MeV sensitivity, and with EJ-299-33 pulse shape discrimination plastic scintillator detectors for fast neutrons. Background gamma rays were monitored with NaI(Tl) detectors. Using these data sets, we compared the measured count rates to a predictive model. This is the first ever data set taken from this location on Earth. We find that for the thermal and epithermal-to-MeV neutron measurements the predictive model and data are in good agreement, except at one location on rock-covered ground near 1 km elevation. The discrepancy at that location may be attributable to ground moisture variability. Other measurements, during this campaign and prior, support the assertion that ground moisture plays a critical role in determining neutron flux.

en physics.ins-det
arXiv Open Access 2019
Towards Robust Named Entity Recognition for Historic German

Stefan Schweter, Johannes Baiter

Recent advances in language modeling using deep neural networks have shown that these models learn representations, that vary with the network depth from morphology to semantic relationships like co-reference. We apply pre-trained language models to low-resource named entity recognition for Historic German. We show on a series of experiments that character-based pre-trained language models do not run into trouble when faced with low-resource datasets. Our pre-trained character-based language models improve upon classical CRF-based methods and previous work on Bi-LSTMs by boosting F1 score performance by up to 6%. Our pre-trained language and NER models are publicly available under https://github.com/stefan-it/historic-ner .

en cs.CL
arXiv Open Access 2017
Stress Testing German Industry Sectors: Results from a Vine Copula Based Quantile Regression

Matthias Fischer, Daniel Kraus, Marius Pfeuffer et al.

Measuring interdependence between probabilities of default (PDs) in different industry sectors of an economy plays a crucial role in financial stress testing. Thereby, regression approaches may be employed to model the impact of stressed industry sectors as covariates on other response sectors. We identify vine copula based quantile regression as an eligible tool for conducting such stress tests as this method has good robustness properties, takes into account potential nonlinearities of conditional quantile functions and ensures that no quantile crossing effects occur. We illustrate its performance by a data set of sector specific PDs for the German economy. Empirical results are provided for a rough and a fine-grained industry sector classification scheme. Amongst others, we confirm that a stressed automobile industry has a severe impact on the German economy as a whole at different quantile levels whereas e.g., for a stressed financial sector the impact is rather moderate. Moreover, the vine copula based quantile regression approach is benchmarked against both classical linear quantile regression and expectile regression in order to illustrate its methodological effectiveness in the scenarios evaluated.

en stat.AP
arXiv Open Access 2017
Measuring Gender Inequalities of German Professions on Wikipedia

Olga Zagovora

Wikipedia is a community-created online encyclopedia; arguably, it is the most popular and largest knowledge resource on the Internet. Thus, reliability and neutrality are of high importance for Wikipedia. Previous research [3] reveals gender bias in Google search results for many professions and occupations. Also, Wikipedia was criticized for existing gender bias in biographies [4] and gender gap in the editor community [5, 6]. Thus, one could expect that gender bias related to professions and occupations may be present in Wikipedia. The term gender bias is used here in the sense of conscious or unconscious favoritism towards one gender over another [47] with respect to professions and occupations. The objective of this work is to identify and assess gender bias. To this end, the German Wikipedia articles about professions and occupations were analyzed on three dimensions: redirections, images, and people mentioned in the articles. This work provides evidence for systematic overrepresentation of men in all three dimensions; female bias is only present for a few professions.

en cs.CY, cs.SI

Halaman 36 dari 47451