Hasil untuk "Germanic languages. Scandinavian languages"

Menampilkan 20 dari ~329336 hasil · dari DOAJ, arXiv, Semantic Scholar, CrossRef

JSON API
DOAJ Open Access 2025
Skandinavische Forschungen am Institut für Germanistik der Universität Wrocław im Zeitraum 2010–2024

Józef Jarosz

Die vorliegende Bibliographie ist eine Fortsetzung der bisherigen bibliographischen Dokumentation wissenschaftlicher Aktivität der Mitarbeiter von der Forschungsstelle für Skandinavistik an der Universität Wrocław. Die Bibliographie umfasst Publikationen, die im Zeitraum 2010–2024 am Institut für Germanistik entstanden und sich mit der Problematik der skandinavischen Länder, ihrer Sprachen und Kulturen auseinandersetzen. Sie wurden in chronologisch-alphabetischer Ordnung aufgelistet. Als ein zusätzliches Kriterium dient der Charakter der Veröffentlichung: Zuerst werden Büchertitel aufgeführt, also Monographien, akademische Lehrbücher und Wörterbücher, dann Artikel, Rezensionen, Übersetzungen und andere Publikationen.

Germanic languages. Scandinavian languages, German literature
DOAJ Open Access 2025
Die Rolle des Auswendiglernens im Kontext moderner Fremdsprachendidaktik: Eine Analyse verschiedener Perspektiven und Strategien

Magdalena Białek

Dieser Artikel untersucht die Rolle des Auswendiglernens im modernen didaktischen Kontext. Er zielt darauf ab, die Situationen und Konzepte zu untersuchen, in denen Auswendiglernen als nützlich betrachtet werden kann. Dabei wird sowohl die theoretische Perspektive als auch die praktische Anwendung des Auswendiglernens betrachtet. Der Artikel beginnt mit einer Diskussion über den Status quo des Auswendiglernens. Es wird darauf hingewiesen, dass Auswendiglernen nicht nur als mechanisches Wiederholen, sondern auch als kognitiver Prozess betrachtet werden kann, der das Verständnis des Gelernten einschließt. Ein Schwerpunkt liegt auf dem Auswendiglernen im frühen Fremdsprachenerwerb, wo traditionelle Methoden mit modernen Ansätzen kontrastiert werden. Es wird argumentiert, dass ritualisierte Sprache für Kinder von entscheidender Bedeutung ist und zur Automatisierung von Sprachkenntnissen beiträgt. Des Weiteren werden verschiedene Strategien des Auswendiglernens in der Unterrichtspraxis diskutiert. Schließlich wird das Auswendiglernen als eine Lernstrategie betrachtet. Dabei wird betont, dass das Auswendiglernen nicht isoliert betrachtet werden sollte, sondern als Teil eines umfassenderen Lernprozesses. Insgesamt bietet der Artikel einen umfassenden Einblick in die Rolle des Auswendiglernens im modernen Bildungskontext.

Germanic languages. Scandinavian languages, German literature
arXiv Open Access 2025
Counting and Sampling Traces in Regular Languages

Alexis de Colnet, Kuldeep S. Meel, Umang Mathur

In this work, we study the problems of counting and sampling Mazurkiewicz traces that a regular language touches. Fix an alphabet $Σ$ and an independence relation $\mathbb{I} \subseteq Σ\times Σ$. The input consists of a regular language $L \subseteq Σ^*$, given by a finite automaton with $m$ states, and a natural number $n$ (in unary). For the counting problem, the goal is to compute the number of Mazurkiewicz traces (induced by $\mathbb{I}$) that intersect the $n^\text{th}$ slice $L_n = L \cap Σ^n$, i.e., traces that admit at least one linearization in $L_n$. For the sampling problem, the goal is to output a trace drawn from a distribution that is approximately uniform over all such traces. These tasks are motivated by bounded model checking with partial-order reduction, where an \emph{a priori} estimate of the reduced state space is valuable, and by testing methods for concurrent programs that use partial-order-aware random exploration. We first show that the counting problem is #P-hard even when $L$ is accepted by a deterministic automaton, in sharp contrast to counting words of a DFA, which is polynomial-time solvable. We then prove that the problem lies in #P for both NFAs and DFAs, irrespective of whether $L$ is trace-closed. Our main algorithmic contributions are a \emph{fully polynomial-time randomized approximation scheme} (FPRAS) that, with high probability, approximates the desired count within a prescribed accuracy, and a \emph{fully polynomial-time almost uniform sampler} (FPAUS) that generates traces whose distribution is provably close to uniform.

en cs.FL, cs.CC
arXiv Open Access 2025
Enhancing Plagiarism Detection in Marathi with a Weighted Ensemble of TF-IDF and BERT Embeddings for Low-Resource Language Processing

Atharva Mutsaddi, Aditya Choudhary

Plagiarism involves using another person's work or concepts without proper attribution, presenting them as original creations. With the growing amount of data communicated in regional languages such as Marathi -- one of India's regional languages -- it is crucial to design robust plagiarism detection systems tailored for low-resource languages. Language models like Bidirectional Encoder Representations from Transformers (BERT) have demonstrated exceptional capability in text representation and feature extraction, making them essential tools for semantic analysis and plagiarism detection. However, the application of BERT for low-resource languages remains under-explored, particularly in the context of plagiarism detection. This paper presents a method to enhance the accuracy of plagiarism detection for Marathi texts using BERT sentence embeddings in conjunction with Term Frequency-Inverse Document Frequency (TF-IDF) feature representation. This approach effectively captures statistical, semantic, and syntactic aspects of text features through a weighted voting ensemble of machine learning models.

en cs.CL, cs.AI
arXiv Open Access 2025
Evaluation of the Code Generation Capabilities of ChatGPT 4: A Comparative Analysis in 19 Programming Languages

L. C. Gilbert

This bachelor's thesis examines the capabilities of ChatGPT 4 in code generation across 19 programming languages. The study analyzed solution rates across three difficulty levels, types of errors encountered, and code quality in terms of runtime and memory efficiency through a quantitative experiment. A total of 188 programming problems were selected from the LeetCode platform, and ChatGPT 4 was given three attempts to produce a correct solution with feedback. ChatGPT 4 successfully solved 39.67% of all tasks, with success rates decreasing significantly as problem complexity increased. Notably, the model faced considerable challenges with hard problems across all languages. ChatGPT 4 demonstrated higher competence in widely used languages, likely due to a larger volume and higher quality of training data. The solution rates also revealed a preference for languages with low abstraction levels and static typing. For popular languages, the most frequent error was "Wrong Answer," whereas for less popular languages, compiler and runtime errors prevailed, suggesting frequent misunderstandings and confusion regarding the structural characteristics of these languages. The model exhibited above-average runtime efficiency in all programming languages, showing a tendency toward statically typed and low-abstraction languages. Memory efficiency results varied significantly, with above-average performance in 14 languages and below-average performance in five languages. A slight preference for low-abstraction languages and a leaning toward dynamically typed languages in terms of memory efficiency were observed. Future research should include a larger number of tasks, iterations, and less popular languages. Additionally, ChatGPT 4's abilities in code interpretation and summarization, debugging, and the development of complex, practical code could be analyzed further. ---- Diese Bachelorarbeit untersucht die Fähigkeiten von ChatGPT 4 zur Code-Generierung in 19 Programmiersprachen. Betrachtet wurden die Lösungsraten zwischen drei Schwierigkeitsgraden, die aufgetretenen Fehlerarten und die Qualität des Codes hinsichtlich der Laufzeit- und Speichereffizienz in einem quantitativen Experiment. Dabei wurden 188 Programmierprobleme der Plattform LeetCode entnommen, wobei ChatGPT 4 jeweils drei Versuche hatte, mittels Feedback eine korrekte Lösung zu generieren. ChatGPT 4 löste 39,67 % aller Aufgaben erfolgreich, wobei die Erfolgsrate mit zunehmendem Schwierigkeitsgrad deutlich abnahm und bei komplexen Problemen in allen Sprachen signifikante Schwierigkeiten auftraten. Das Modell zeigte eine höhere Kompetenz in weit verbreiteten Sprachen, was wahrscheinlich auf eine größere Menge und höhere Qualität der Trainingsdaten zurückzuführen ist. Bezüglich der Lösungsraten zeigte das Modell zudem eine Präferenz für Sprachen mit niedrigem Abstraktionsniveau und statischer Typisierung. Bei Sprachen hoher Popularität trat der Fehler Wrong Answer am häufigsten auf, während bei weniger populären Sprachen Compiler- und Laufzeitfehler überwogen, was auf häufige Missverständnisse und Verwechslungen bezüglich der spezifischen strukturellen Eigenschaften dieser Sprachen zurückzuführen ist. ChatGPT 4 demonstrierte in allen Programmiersprachen eine überdurchschnittliche Laufzeiteffizienz und tendierte diesbezüglich erneut zu statisch typisierten und niedrig abstrahierten Sprachen. Die Werte zur Speichereffizienz variierten erheblich, wobei in 14 Sprachen überdurchschnittliche und in fünf Sprachen unterdurchschnittliche Werte erzielt wurden. Es zeigte sich diesbezüglich eine leichte Tendenz zugunsten von niedrig abstrahierten sowie eine Präferenz zu dynamisch typisierten Sprachen. Zukünftige Forschung sollte eine höhere Anzahl an Aufgaben, Iterationen und unpopulären Sprachen einbeziehen. Darüber hinaus könnten die Fähigkeiten von ChatGPT 4 in der Code-Interpretation und -Zusammenfassung, im Debugging und in der Entwicklung komplexer, praxisbezogener Codes analysiert werden.

en cs.SE, cs.AI
arXiv Open Access 2024
Building pre-train LLM Dataset for the INDIC Languages: a case study on Hindi

Shantipriya Parida, Shakshi Panwar, Kusum Lata et al.

Large language models (LLMs) demonstrated transformative capabilities in many applications that require automatically generating responses based on human instruction. However, the major challenge for building LLMs, particularly in Indic languages, is the availability of high-quality data for building foundation LLMs. In this paper, we are proposing a large pre-train dataset in Hindi useful for the Indic language Hindi. We have collected the data span across several domains including major dialects in Hindi. The dataset contains 1.28 billion Hindi tokens. We have explained our pipeline including data collection, pre-processing, and availability for LLM pre-training. The proposed approach can be easily extended to other Indic and low-resource languages and will be available freely for LLM pre-training and LLM research purposes.

en cs.CL, cs.AI
arXiv Open Access 2024
Social Bias in Large Language Models For Bangla: An Empirical Study on Gender and Religious Bias

Jayanta Sadhu, Maneesha Rani Saha, Rifat Shahriyar

The rapid growth of Large Language Models (LLMs) has put forward the study of biases as a crucial field. It is important to assess the influence of different types of biases embedded in LLMs to ensure fair use in sensitive fields. Although there have been extensive works on bias assessment in English, such efforts are rare and scarce for a major language like Bangla. In this work, we examine two types of social biases in LLM generated outputs for Bangla language. Our main contributions in this work are: (1) bias studies on two different social biases for Bangla, (2) a curated dataset for bias measurement benchmarking and (3) testing two different probing techniques for bias detection in the context of Bangla. This is the first work of such kind involving bias assessment of LLMs for Bangla to the best of our knowledge. All our code and resources are publicly available for the progress of bias related research in Bangla NLP.

en cs.CL
arXiv Open Access 2024
Kallaama: A Transcribed Speech Dataset about Agriculture in the Three Most Widely Spoken Languages in Senegal

Elodie Gauthier, Aminata Ndiaye, Abdoulaye Guissé

This work is part of the Kallaama project, whose objective is to produce and disseminate national languages corpora for speech technologies developments, in the field of agriculture. Except for Wolof, which benefits from some language data for natural language processing, national languages of Senegal are largely ignored by language technology providers. However, such technologies are keys to the protection, promotion and teaching of these languages. Kallaama focuses on the 3 main spoken languages by Senegalese people: Wolof, Pulaar and Sereer. These languages are widely spoken by the population, with around 10 million of native Senegalese speakers, not to mention those outside the country. However, they remain under-resourced in terms of machine-readable data that can be used for automatic processing and language technologies, all the more so in the agricultural sector. We release a transcribed speech dataset containing 125 hours of recordings, about agriculture, in each of the above-mentioned languages. These resources are specifically designed for Automatic Speech Recognition purpose, including traditional approaches. To build such technologies, we provide textual corpora in Wolof and Pulaar, and a pronunciation lexicon containing 49,132 entries from the Wolof dataset.

en cs.CL
arXiv Open Access 2024
Thoughts on Learning Human and Programming Languages

Daniel S. Katz, Jeffrey C. Carver

This is a virtual dialog between Jeffrey C. Carver and Daniel S. Katz on how people learn programming languages. It's based on a talk Jeff gave at the first US-RSE Conference (US-RSE'23), which led Dan to think about human languages versus computer languages. Dan discussed this with Jeff at the conference, and this discussion continued asynchronous, with this column being a record of the discussion.

en cs.SE, cs.CY
arXiv Open Access 2023
A Survey of Corpora for Germanic Low-Resource Languages and Dialects

Verena Blaschke, Hinrich Schütze, Barbara Plank

Despite much progress in recent years, the vast majority of work in natural language processing (NLP) is on standard languages with many speakers. In this work, we instead focus on low-resource languages and in particular non-standardized low-resource languages. Even within branches of major language families, often considered well-researched, little is known about the extent and type of available resources and what the major NLP challenges are for these language varieties. The first step to address this situation is a systematic survey of available corpora (most importantly, annotated corpora, which are particularly valuable for NLP research). Focusing on Germanic low-resource language varieties, we provide such a survey in this paper. Except for geolocation (origin of speaker or document), we find that manually annotated linguistic resources are sparse and, if they exist, mostly cover morphosyntax. Despite this lack of resources, we observe that interest in this area is increasing: there is active development and a growing research community. To facilitate research, we make our overview of over 80 corpora publicly available. We share a companion website of this overview at https://github.com/mainlp/germanic-lrl-corpora .

en cs.CL
arXiv Open Access 2023
Polynomial definability in constraint languages with few subpowers

Jakub Bulín, Michael Kompatscher

A first-order formula is called primitive positive (pp) if it only admits the use of existential quantifiers and conjunction. Pp-formulas are a central concept in (fixed-template) constraint satisfaction since CSP($Γ$) can be viewed as the problem of deciding the primitive positive theory of $Γ$, and pp-definability captures gadget reductions between CSPs. An important class of tractable constraint languages $Γ$ is characterized by having few subpowers, that is, the number of $n$-ary relations pp-definable from $Γ$ is bounded by $2^{p(n)}$ for some polynomial $p(n)$. In this paper we study a restriction of this property, stating that every pp-definable relation is definable by a pp-formula of polynomial length. We conjecture that the existence of such short definitions is actually equivalent to $Γ$ having few subpowers, and verify this conjecture for a large subclass that, in particular, includes all constraint languages on three-element domains. We furthermore discuss how our conjecture imposes an upper complexity bound of co-NP on the subpower membership problem of algebras with few subpowers.

en cs.LO, math.LO
DOAJ Open Access 2022
Literatura e escrita criativa em sala de aula invertida de alemão como língua estrangeira durante a pandemia de Covid-19

Adriana Borgerth V. C. Lima

Durante a pandemia de Covid-19 em 2020, professores e alunos viram-se impelidos a utilizar estratégias de ensino remoto, para não interromperem o andamento das aulas. Nesse contexto, o modelo pedagógico “Sala de Aula Invertida” (Leffa; Duarte; Alda 2016) (Andrade; Coutinho 2018) foi uma possibilidade para a implementação on-line da proposta “Literatura e escrita criativa em sala de aula invertida de alemão como língua estrangeira”. A literatura pode se apresentar como alternativa a textos não-autênticos dos livros didáticos de língua estrangeira, trazendo importantes contextos de vocabulário e estruturas corretamente empregadas (Koppensteiner 2001). A escrita criativa tem na leitura a base que constrói o repertório do escritor, e, através de técnicas, resgata sua criatividade (Rodrigues 2015). O professor de língua estrangeira poderá usar essa criatividade para trabalhar em aula a escrita, além da gramática, da leitura, da oralidade (Silva 2013). Da proposta participaram uma dupla e uma aluna individual, de nível A2 (QECR 2001); foram lidos cinco livros de autores de língua alemã, discutidos os temas abordados nas obras, contextualizados e também transpostos para os dias atuais, permitindo às alunas comentá-los no contexto cultural brasileiro. Esse processo embasou a escrita das histórias dessas alunas.

German literature, Germanic languages. Scandinavian languages
arXiv Open Access 2022
High-level Synthesis using the Julia Language

Benjamin Biggs, Ian McInerney, Eric C. Kerrigan et al.

The growing proliferation of FPGAs and High-level Synthesis (HLS) tools has led to a large interest in designing hardware accelerators for complex operations and algorithms. However, existing HLS toolflows typically require a significant amount of user knowledge or training to be effective in both industrial and research applications. In this paper, we propose using the Julia language as the basis for an HLS tool. The Julia HLS tool aims to decrease the barrier to entry for hardware acceleration by taking advantage of the readability of the Julia language and by allowing the use of the existing large library of standard mathematical functions written in Julia. We present a prototype Julia HLS tool, written in Julia, that transforms Julia code to VHDL. We highlight how features of Julia and its compiler simplified the creation of this tool, and we discuss potential directions for future work.

en cs.SE, cs.AR
CrossRef Open Access 2021
Resilient Subject Agreement Morpho-Syntax in the Germanic Romance Contact Area

Cecilia Poletto, Alessandra Tomaselli

In this work, we intend to investigate one fundamental aspect of language contact by comparing the distribution of subjects in German, Northern Italian dialects and Cimbrian. Here, we show that purely syntactic order phenomena are more prone to convergence, i.e., less resilient, while phenomena that have a clearly identifiable morphological counterpart are more resilient. The empirical domain of investigation for our analysis is the morphosyntax of both nominal and pronominal subjects, the agreement pattern and their position in Cimbrian grammar. While agreement patterns display a highly conservative paradigm, the syntax of nominal (vP-peripheral and topicalized) subjects is innovative and mimics the Italian linear word order.

DOAJ Open Access 2021
‘Interculturality’ And ‘Intercultural Literature’ A Contribution To The Discussion And Exemplary Testing Of The New Basic Terms Of Intercultural Literary Studies

Aylin Nadine Kul

This article discusses the concepts of ‘interculturality’ and ‘intercultural literature’, which circulate as important basic concepts in intercultural literary studies. Considering the fact that the boundaries of these terms often merge into each other, this paper strives to clearly define the usage of the terms. By the discussion on the example of Thorsten Becker’s novel Sieger nach Punkten (Winner on Points), the use of the mentioned terms may be revealed. The criteria for literary interculturality established by Blioumi (2002) and the components of intercultural literature determined by Chiellino (2002) served as the basis for the analysis. By testing these approaches on Becker’s novel, it became clear that it fulfills the criteria of literary interculturality, which lie in a dynamic concept of culture, in double optics, self-criticism, empathy, and hybridity, and is characterized by a special intercultural potential. In this, the significant social contribution of Winner on Points becomes evident, which manifests itself in the fact that the novel promotes intercultural understanding between Turkish and German societies. The questioning of the criteria of intercultural literature drafted by Chiellino, which he sees in the presence of an intercultural memory, an intercultural interlocutor and in the so-called linguistic latency, led to the conclusion that Winner on Points can be classified in intercultural literature. Overall, the analysis functioned not only as an example for applying the approaches elaborated by Blioumi and Chiellino to a textual example, but also for deepening and distinguishing the meaning and use of the basic terms of intercultural literary studies.

German literature, Germanic languages. Scandinavian languages
arXiv Open Access 2021
Sign and Search: Sign Search Functionality for Sign Language Lexica

Manolis Fragkiadakis, Peter van der Putten

Sign language lexica are a useful resource for researchers and people learning sign languages. Current implementations allow a user to search a sign either by its gloss or by selecting its primary features such as handshape and location. This study focuses on exploring a reverse search functionality where a user can sign a query sign in front of a webcam and retrieve a set of matching signs. By extracting different body joints combinations (upper body, dominant hand's arm and wrist) using the pose estimation framework OpenPose, we compare four techniques (PCA, UMAP, DTW and Euclidean distance) as distance metrics between 20 query signs, each performed by eight participants on a 1200 sign lexicon. The results show that UMAP and DTW can predict a matching sign with an 80\% and 71\% accuracy respectively at the top-20 retrieved signs using the movement of the dominant hand arm. Using DTW and adding more sign instances from other participants in the lexicon, the accuracy can be raised to 90\% at the top-10 ranking. Our results suggest that our methodology can be used with no training in any sign language lexicon regardless of its size.

en cs.CV, cs.IR
arXiv Open Access 2020
Transfer Learning for British Sign Language Modelling

Boris Mocialov, Graham Turner, Helen Hastie

Automatic speech recognition and spoken dialogue systems have made great advances through the use of deep machine learning methods. This is partly due to greater computing power but also through the large amount of data available in common languages, such as English. Conversely, research in minority languages, including sign languages, is hampered by the severe lack of data. This has led to work on transfer learning methods, whereby a model developed for one language is reused as the starting point for a model on a second language, which is less resourced. In this paper, we examine two transfer learning techniques of fine-tuning and layer substitution for language modelling of British Sign Language. Our results show improvement in perplexity when using transfer learning with standard stacked LSTM models, trained initially using a large corpus for standard English from the Penn Treebank corpus

en cs.CL

Halaman 7 dari 16467