Jeang-Yean Goak
Hasil untuk "Germanic languages. Scandinavian languages"
Menampilkan 20 dari ~329411 hasil · dari DOAJ, arXiv, CrossRef, Semantic Scholar
Lasse Mårtensson, Eva Pettersson, Veturliði Óskarsson
It has often been pointed out that use of abbreviations is characteristic of Old West Norse manuscripts, especially the Icelandic ones. This article deals with the use of abbreviations primarily in Old Icelandic manuscripts in comparison to Old Norwegian manuscripts and also one Old Swedish manuscript. The material consists of digital editions of manuscripts or parts of manuscripts, mostly from the digital archives of Menota and Emroon. The Icelandic manuscripts in this study are from around 1280 to the beginning of the 16th century, while the Norwegian ones are from 1200–1350, mostly from 1270–1325. The Old Swedish manuscript is dated to around 1280. Among the Icelandic manuscripts, a gradual development can be seen over time towards more abbreviation; in the manuscript from 1280, 33% of the words are abbreviated, while in the manuscript from the beginning of the 16th century, 62% of the words are abbreviated. Such a development is not observed in the Norwegian manuscripts, which were, however, written over a shorter period of time. Another tendency is that the degree of abbreviation is lower in poetry, where there is a high proportion of unusual words compared to ordinary prose. A significant regional variation in the use of abbreviations can also be observed. The Icelandic manuscripts are considerably more abbreviated than the Norwegian ones; the former vary between 33–62% abbreviated words, while the latter between 10–26% abbreviated words. Furthermore, the Norwegian manuscripts are more abbreviated than the Old Swedish manuscript examined in this study. This manuscript has 5.6% abbreviated words, which is a significantly lower percentage than in the Norwegian manuscripts.
Cormac Anderson, Matthew Scarborough, Lechosław Jocz et al.
Abstract The Indo-European Cognate Relationships (IE-CoR) dataset is an open-access relational dataset showing how related, inherited words (‘cognates’) pattern across 160 languages of the Indo-European family. IE-CoR is intended as a benchmark dataset for computational research into the evolution of the Indo-European languages. It is structured around 170 reference meanings in core lexicon, and contains 25731 lexeme entries, analysed into 4981 cognate sets. Novel, dedicated structures are used to code all known cases of horizontal transfer. All 13 main documented clades of Indo-European, and their main subclades, are well represented. Time calibration data for each language are also included, as are relevant geographical and social metadata. Data collection was performed by an expert consortium of 89 linguists drawing on 355 cited sources. The dataset is extendable to further languages and meanings and follows the Cross-Linguistic Data Format (CLDF) protocols for linguistic data. It is designed to be interoperable with other cross-linguistic datasets and catalogues, and provides a reference framework for similar initiatives for other language families.
Wojciech Czerwiński, Łukasz Orlikowski
The aim of this paper is to deliver broad understanding of a class of languages of boundedly-ambiguous VASS, that is k-ambiguous VASS for some natural k. These are languages of Vector Addition Systems with States with the acceptance condition defined by the set of accepting states such that each accepted word has at most k accepting runs. We develop tools for proving that a given language is not accepted by any k-ambiguous VASS. Using them we show a few negative results: lack of some closure properties of languages of k-ambiguous VASS and undecidability of the k-ambiguity problem, namely the question whether a given VASS language is a language of some k-ambiguous VASS. Finally, we show that the regularity problem is decidable for k-ambiguous VASS.
Federico Pennino, Bianca Raimondi, Massimo Rondelli et al.
Generating accurate and executable code using large language models (LLMs) is challenging for languages with limited public training data compared to popular languages such as Python. This paper introduces a generalizable approach that uses small-scale code versions of the Qwen 2.5 model combined with Group Relative Policy Optimization (GRPO) to enable effective code generation through explicit reasoning steps, which is particularly beneficial for languages with smaller source code databases. Using Prolog as a representative use case -- given its limited online presence -- the initial model faced challenges in generating executable code. After some training steps, the model successfully produces logically consistent and syntactically accurate code by directly integrating reasoning-driven feedback into the reinforcement learning loop. Experimental evaluations using mathematical logic problem benchmarks illustrate significant improvements in reasoning quality, code accuracy, and logical correctness, underscoring the potential of this approach to benefit a wide range of programming languages lacking extensive training resources.
Wojciech Czerwiński, Łukasz Orlikowski
In this work, we extend undecidability of language equivalence for two-dimensional Vector Addition System with States (VASS) accepting by coverability condition. We show that the problem is undecidable even when one of the two-dimensional VASSs is deterministic and the other is history-deterministic. Moreover, we observe, that the languages of two history-deterministic VASSs are equal if and only if each can simulate the other. This observation allows us to extend the undecidability to any equivalence relation between two-sided simulation and language equivalence.
Abhiram Bellur, Razan Alghamdi, Kidus Workneh et al.
Library learning is the process of building a library of common functionalities from a given set of programs. Typically, this process is applied in the context of aiding program synthesis: concise functions can help the synthesizer produce modularized code that is smaller in size. Previous work has focused on functional Lisp-like languages, as their regularity makes them more amenable to extracting repetitive structures. Our work introduces Leroy, which extends existing library learning techniques to imperative higher-level programming languages, with the goal of facilitating reusability and ease of maintenance. Leroy wraps the existing Stitch framework for library learning and converts imperative programs into a Lisp-like format using the AST. Our solution uses Stitch to do a top-down, corpus-guided extraction of repetitive expressions. Further, we prune abstractions that cannot be implemented in the programming language and convert the best abstractions back to the original language. We implement our technique in a tool for a subset of the Python programming language and evaluate it on a large corpus of programs. Leroy achieves a compression ratio of 1.04x of the original code base, with a slight expansion when the library is included. Additionally, we show that our technique prunes invalid abstractions.
Helena Soini
This article explores the folklore components of Finnish rock poetry in Finnish and Swedish. It highlights how Finnish rock culture has revived an ancient tradition where poetry and music were intertwined. While musicians of Amorphis reference “Kalevala” mythology in the songs like “Tuonela”, “Sampo” and “Kantele”, they do not explicitly mention the names of “Kalevala” heroes in their adaptations of runes. The themes of their lyrics seem to take on a universal quality and are set in modern times. The influence of folklore traditions on Finnish authors is evident even in their English-language works. Ville Sorvali of Moonsorrow interprets the runes, trying to understand the intention of their creators, believing that these lyrics resonate with people worldwide. Similarly, Jan Jämsen (Katla), a Swedish-speaking Finn, infuses his native Swedish lyrics for Finntroll with imagery of fantastical creatures and incorporates Sámi folk chants joik to emphasize their uniqueness. Modern Finnish rock poetry stands out for its multilingualism and references to mythological themes from Finnish and Scandinavian folk poetry. However, the resurgence of metrics and imagery from the “Kalevala” and “Kanteletar” in rock poetry was unexpected. Authors use figurative symbolism that may require a deep understanding of Finnish culture to fully appreciate, even in Englishlanguage works. Finnish rock poetry is a diverse and vibrant art form that resonates with universal themes and emotions.
N. Svetozarova
This article discusses the history of creative contacts between the great Norwegian playwright Henrik Ibsen and the German poet Christian Morgenstern (1871–1914). Christian Morgenstern’s life was short and marred by physical suffering, but fantastically full and diverse in creative terms. A significant part of Christian Morgenstern’s lyrical and epistolary legacy was published only after his death thanks to the efforts of his wife and friends. Christian Morgenstern’s translations of Henrik Ibsen’s works date from the late 19th century, when the new Solomon Fischer’s publishing house (S. Fischer Verlag) in Berlin decided to publish the complete works of Ibsen in a translation into a German language that would be worthy of the original language. The publishing house turned to a young and at the time still little known poet who, being in love with Scandinavian literature and with Henrik Ibsen, set to work with great enthusiasm, settled in a family boarding house near Christiania, in a short time learned Norwegian, consulted and corresponded with Ibsen several times and as a result created translations for his plays, the German of which was delighted and earned the high praise of the playwright. An authorized edition of the translations was printed in Germany between 1898 and 1904 and is now a bibliographic rarity. However, many of Ibsen’s works are still published in Germany in the translation of Christian Morgenstern, known primarily as an unsurpassed master of poetic miniatures in a unique style of lyrical humor.
Inês Marques
O presente artigo aborda o modelo histórico-crítico proposto por Hans Zeller, definindo-se, primeiramente, o conceito de versão. De seguida, explica-se que, neste cenário, todos os testemunhos atestadamente autógrafos possuem o mesmo nível de autoridade dentro da tradição do texto, uma vez que correspondem à vontade do autor em determinados momentos da história do texto. Em contraste com esta posição, retratam-se as ideias subjacentes ao método do copy-text, paradigma editorial anglo-saxônico que defende a valorização da intenção final do autor, sendo apenas as lições que representam a última vontade do autor relativamente ao seu texto aquelas que devem figurar na edição crítica.
Jean-Philippe Bernardy, Patrik Jansson
The tensor notation used in several areas of mathematics is a useful one, but it is not widely available to the functional programming community. In a practical sense, the (embedded) domain-specific languages (DSLs) that are currently in use for tensor algebra are either 1. array-oriented languages that do not enforce or take advantage of tensor properties and algebraic structure or 2. follow the categorical structure of tensors but require the programmer to manipulate tensors in an unwieldy point-free notation. A deeper issue is that for tensor calculus, the dominant pedagogical paradigm assumes an audience which is either comfortable with notational liberties which programmers cannot afford, or focus on the applied mathematics of tensors, largely leaving their linguistic aspects (behaviour of variable binding, syntax and semantics, etc.) for the reader to figure out by themselves. This state of affairs is hardly surprising, because, as we highlight, several properties of standard tensor notation are somewhat exotic from the perspective of lambda calculi. We bridge the gap by defining a DSL, embedded in Haskell, whose syntax closely captures the index notation for tensors in wide use in the literature. The semantics of this EDSL is defined in terms of the algebraic structures which define tensors in their full generality. This way, we believe that our EDSL can be used both as a tool for scientific computing, but also as a vehicle to express and present the theory and applications of tensors.
Ruben Becker, Davide Cenzato, Sung-Hwan Kim et al.
A Wheeler automaton is a finite state automaton whose states admit a total Wheeler order, reflecting the co-lexicographic order of the strings labeling source-to-node paths. A Wheeler language is a regular language admitting an accepting Wheeler automaton. Wheeler languages admit efficient and elegant solutions to hard problems such as automata compression and regular expression matching, therefore deciding whether a regular language is Wheeler is relevant in applications requiring efficient solutions to those problems. In this paper, we show that it is possible to decide whether a DFA with n states and m transitions recognizes a Wheeler language in $O(mn)$ time. This is a significant improvement over the running time $O(n^{13} + m\log n)$ of the previous polynomial-time algorithm (Alanko et al., Information and Computation 2021). A proof-of-concept implementation of this algorithm is available in a public repository. We complement this upper bound with a conditional matching lower bound stating that, unless the strong exponential time hypothesis (SETH) fails, the problem cannot be solved in strongly subquadratic time. The same problem is known to be PSPACE-complete when the input is an NFA (D'Agostino et al., Theoretical Computer Science 2023). Together with that result, our paper essentially closes the algorithmic problem of Wheeler language recognition.
Thomas Fritz
This paper starts with a short debate on the role of the research method of "linguistic landscapes", which is part of the ethnographic paradigm. It presents multilingualism, or rather metrolingualism in the market place; a space where languages meet and intermingle and multilingual repertoires collaborate to make linguistic actions possible. The socio-demographic element of linguistic landscapes is also shown in the existence of China Town in Vienna. Multilingualism "from below" is contrasted to the official language policies of the country.
James E. Knirk
Christopher Hahn, Frederik Schmitt, Julia J. Tillman et al.
We study the generalization abilities of language models when translating natural language into formal specifications with complex semantics. In particular, we fine-tune language models on three datasets consisting of English sentences and their corresponding formal representation: 1) regular expressions (regex), frequently used in programming and search; 2) First-order logic (FOL), commonly used in software verification and theorem proving; and 3) linear-time temporal logic (LTL), which forms the basis for industrial hardware specification languages. Our experiments show that, in these diverse domains, the language models maintain their generalization capabilities from pre-trained knowledge of natural language to generalize, e.g., to new variable names or operator descriptions. Additionally, they achieve competitive performance, and even outperform the state-of-the-art for translating into regular expressions, with the benefits of being easy to access, efficient to fine-tune, and without a particular need for domain-specific reasoning.
Signe Smith Jervelund, T. Eikemo
When the nature and scale of a problem is new, it cannot be approached by standardised methods because it represents a unique challenge and because all possible solutions may lead to unknown negative consequences [1]. This description fits the challenges faced by COVID-19 well. Because the nature and scale of COVID-19 are new, there are no proven solutions to tackle the pandemic. This is why countries have implemented different strategies, and this is why these strategies may even change from one day to the next. The Nordic countries are not an exception in this respect, and the varying strategies implemented in these countries have led to large variations in the early impact of COVID-19 infections and mortality [2]. The Scandinavian Journal of Public Health will publish a series of special issues dedicated to the shortand long-term social, economic and healthrelated consequences of COVID-19 in the Nordic countries and beyond. In this first special issue, we present a series of commentaries, empirical articles and one study design article that together highlights not only the uncertainty and complexity of the pandemic, but also some of the opportunities for research and guidance in terms of suggestions for policies. The issue also includes a memorial to honour a fellow research colleague who unexpectedly died at a young age during the worst of the pandemic. Although many of the challenges faced by COVID19 are new to modern society, the world has faced pandemics before. Above all, they have taught us that pandemics do not hit countries, societies and individuals equally. In fact, they are experienced unequally, with higher rates of infection and shortand long-term morbidity and mortality among the most disadvantaged groups – particularly in more socially unequal countries [3]. Emerging evidence from a variety of countries suggests that these inequalities are being mirrored today in the COVID-19 pandemic [4]. Both then and now, these inequalities have emerged through the syndemic nature of COVID-19 as it interacts with existing social inequalities in chronic disease and the social determinants of health [3]. This happens because people living in poor areas have a higher proportion of almost all known underlying risk factors (such as high blood pressure, diabetes, asthma, chronic obstructive pulmonary disease, heart disease, liver disease, cancer, obesity and smoking) that increase the severity and mortality of COVID-19. Ethnic minorities [5], nursing home residents [5], the elderly [5,6], refugees [7], the homeless, migrants, sex workers and inmates in prisons are just some examples of marginalised groups that have a higher proportion of chronic disease than the rest of the population. Social inequalities in underlying chronic diseases come as a result of social inequalities in access to necessary benefits such as education, good working conditions, housing, food, clean water and healthcare. When the pandemic hits poor countries and poor areas within countries, they are hit harder because they are already over-represented with chronic conditions. This also happens because one of the most effective measures to reduce the spread of the coronavirus is social distancing. This is more challenging for people living in crowded housing conditions with little opportunity for selfisolation, which is often the case in poor areas, and for people who work in the service and health sector with greater contact with other citizens and less job flexibility. This happens as well due to language challenges for some citizens from an ethnic minority background who do not have the same opportunities to understand the changing recommendations from the health authorities and thus apply appropriate The double burden of COVID-19
Lodewijk Bergmans, Xander Schrijen, Edwin Ouwehand et al.
It is well-known, and often a topic of heated debates, that programs in some programming languages are more concise than in others. This is a relevant factor when comparing or aggregating volume-impacted metrics on source code written in a combination of programming languages. In this paper, we present a model for measuring the conciseness of programming languages in a consistent, objective and evidence-based way. We present the approach, explain how it is founded on information theoretical principles, present detailed analysis steps and show the quantitative results of applying this model to a large benchmark of diverse commercial software applications. We demonstrate that our metric for language conciseness is strongly correlated with both an alternative analytical approach, and with a large scale developer survey, and show how its results can be applied to improve software metrics for multi-language applications.
James M. Stratton
While the study of English intensifiers has been a topic of much empirical discussion (Bolinger 1972, Paradis 1997, Ito & Tagliamonte 2003, Xiao & Tao 2007, Fuchs 2017), intensification in the German language is underexplored. The present study operationalizes variationist methods to comprehensively examine the syntactic intensification of adjectives in German by investigating how adjective intensifiers rank empirically in terms of frequency and whether their use is sensitive to the social factors gender and age. Results indicate that in German, amplifiers are more frequent than downtoners, boosters are more frequent than maximizers, and the gender and the age of the speaker are factors that influence their use. These findings corroborate crosslinguistic findings (Peters 1994, Broekhuis 2013, D’Arcy 2015, Fuchs 2017). Broadly speaking, the present study suggests that the syntactic intensification of adjectives in German is, in many ways, similar to what has been observed previously in other Germanic languages.*
Sam Blackshear, David L. Dill, Shaz Qadeer et al.
Smart contracts are programs that implement potentially sophisticated transactions on modern blockchain platforms. In the rapidly evolving blockchain environment, smart contract programming languages must allow users to write expressive programs that manage and transfer assets, yet provide strong protection against sophisticated attacks. Addressing this need, we present flexible and reliable abstractions for programming with digital currency in the Move language [Blackshear et al. 2019]. Move uses novel linear [Girard 1987] resource types with semantics drawing on C++11 [Stroustrup 2013] and Rust [Matsakis and Klock 2014]: when a resource value is assigned to a new memory location, the location previously holding it must be invalidated. In addition, a resource type can only be created or destroyed by procedures inside its declaring module. We present an executable bytecode language with resources and prove that it enjoys resource safety, a conservation property for program values that is analogous to conservation of mass in the physical world.
Raphael Scheible, Fabian Thomczyk, Patric Tippmann et al.
Lately, pre-trained language models advanced the field of natural language processing (NLP). The introduction of Bidirectional Encoders for Transformers (BERT) and its optimized version RoBERTa have had significant impact and increased the relevance of pre-trained models. First, research in this field mainly started on English data followed by models trained with multilingual text corpora. However, current research shows that multilingual models are inferior to monolingual models. Currently, no German single language RoBERTa model is yet published, which we introduce in this work (GottBERT). The German portion of the OSCAR data set was used as text corpus. In an evaluation we compare its performance on the two Named Entity Recognition (NER) tasks Conll 2003 and GermEval 2014 as well as on the text classification tasks GermEval 2018 (fine and coarse) and GNAD with existing German single language BERT models and two multilingual ones. GottBERT was pre-trained related to the original RoBERTa model using fairseq. All downstream tasks were trained using hyperparameter presets taken from the benchmark of German BERT. The experiments were setup utilizing FARM. Performance was measured by the $F_{1}$ score. GottBERT was successfully pre-trained on a 256 core TPU pod using the RoBERTa BASE architecture. Even without extensive hyper-parameter optimization, in all NER and one text classification task, GottBERT already outperformed all other tested German and multilingual models. In order to support the German NLP field, we publish GottBERT under the AGPLv3 license.
Halaman 31 dari 16471