Andrew Kostakis
Hasil untuk "Germanic languages. Scandinavian languages"
Menampilkan 20 dari ~320628 hasil · dari CrossRef, arXiv, DOAJ
Yifan Zong, Yuntian Deng, Pengyu Nie
Large language models (LLMs) have demonstrated impressive capabilities in aiding developers with tasks like code comprehension, generation, and translation. Supporting multilingual programming -- i.e., coding tasks across multiple programming languages -- typically requires either (1) finetuning a single LLM across all programming languages, which is cost-efficient but sacrifices language-specific specialization and performance, or (2) finetuning separate LLMs for each programming language, which allows for specialization but is computationally expensive and storage-intensive due to the duplication of parameters. This paper introduces MoLE (Mix-of-Language-Experts), a novel architecture that balances efficiency and specialization for multilingual programming. MoLE is composed of a base model, a shared LoRA (low-rank adaptation) module, and a collection of language-specific LoRA modules. These modules are jointly optimized during the finetuning process, enabling effective knowledge sharing and specialization across programming languages. During inference, MoLE automatically routes to the language-specific LoRA module corresponding to the programming language of the code token being generated. Our experiments demonstrate that MoLE achieves greater parameter efficiency compared to training separate language-specific LoRAs, while outperforming a single shared LLM finetuned for all programming languages in terms of accuracy.
Jan M. Boelmann, Lisa König, Jaron Müller
Janusz Stopyra
Der Aufsatz versucht, den Leser auf die parallelen Wortbildungsmuster des Deutschen und Dänischen aufmerksam zu machen, die als methodische Grundlage im Unterrichtsprogramm der Beherrschung des Dänischen als zweiter Fremdsprache durch die Deutschstudenten dienen. Zugleich versucht er aber auch, die meist bekannten Abweichungen der dänischen Morphologie und Syntax von der deutschen, und damit auch die Eigenart des Dänischen, aufzuzeigen. Ein gesondertes Kapitel erwähnt außerdem die Phonologie als das für die deutschen und slawischen Lerner schwierigste Sprachsubsystem des Dänischen.
Kenneth Enevoldsen, Márton Kardos, Niklas Muennighoff et al.
The evaluation of English text embeddings has transitioned from evaluating a handful of datasets to broad coverage across many tasks through benchmarks such as MTEB. However, this is not the case for multilingual text embeddings due to a lack of available benchmarks. To address this problem, we introduce the Scandinavian Embedding Benchmark (SEB). SEB is a comprehensive framework that enables text embedding evaluation for Scandinavian languages across 24 tasks, 10 subtasks, and 4 task categories. Building on SEB, we evaluate more than 26 models, uncovering significant performance disparities between public and commercial solutions not previously captured by MTEB. We open-source SEB and integrate it with MTEB, thus bridging the text embedding evaluation gap for Scandinavian languages.
Götz Ursula
Rafael Lomeu Gomes, Toril Opsahl, Unn Røyneland
I denne artikkelen diskuterer vi planlagt imitasjon av norsk og engelsk som andrespråk brukt i humor i populærkulturen. Trekk som assosieres med talere av norsk som andrespråk, inngår også som en del av repertoaret til ungdommer i heterogene, urbane miljøer i det som populært er blitt kalt «kebabnorsk». I denne artikkelen utforsker vi tilfeller der en slik multietnolektisk stil blir parodiert for å framheve, utfordre eller befeste identitetskategorier. Avisartikler publisert mellom 2015 og 2021 hvor ordet «kebabnorsk» forekommer, sammenstilles med analyser av videoklipp fra TV-serier hvor humoristiske imitasjoner av andrespråks- og «kebabnorsk»-talende personer brukes. Med bruk av et rasiolingvistisk perspektiv (Flores & Rosa, 2023), teorier om autentisitet (Coupland, 2003; Woolard, 2016), indeksikalsk regimentering (Bucholtz, 2011), begrepet språklig minstrel (Bucholtz & Lopez, 2011) og Bakhtins karnevalesk-begrep (Bakhtin, 1968, 1981; Kjus, 2005) diskuterer vi tilfeller der sammenhengen mellom språk og kropp framstilles som «naturgitt» og ikke sosialt konstruert, og tilfeller der (meta-)parodiske framstillinger kan utfordre stereotypier og bidra til en denaturalisering av språk-/kropps- og språk-/sted-forbindelser. Studien bidrar til en bredere forståelse av hvordan stereotypier blir diskursivt (re-)produsert, og maktforhold forhandlet.
Muhammad Shoaib Farooq, Taymour zaman Khan
Programming is an integral part of computer science discipline. Every day the programming environment is not only rapidly growing but also changing and languages are constantly evolving. Learning of object-oriented paradigm is compulsory in every computer science major so the choice of language to teach object-oriented principles is very important. Due to large pool of object-oriented languages, it is difficult to choose which should be the first programming language in order to teach object-oriented principles. Many studies shown which should be the first language to tech object-oriented concepts but there is no method to compare and evaluate these languages. In this article we proposed a comprehensive framework to evaluate the widely used object-oriented languages. The languages are evaluated basis of their technical and environmental features.
Selmani Lirim
Dominika Skrzypek, Anna Kurek-Przybilski, Alicja Piotrowska
Krasimir Yordzhev
Context-free languages are widely used to describe the syntax of programming languages and natural languages. Usually, we describe a context-free language mathematically with the help of context-free grammar (for generation) or pushdown automata (for recognition). The purpose of this study is to describe some unconventional methods of description of context-free languages, namely a representation with the help of finite digraphs and with automata - generators of context-free languages. We will mainly focus on the mathematical models of these representations.
Francisco Manuel García Chicote
This article analyzes Georg Simmel’s concept of culture by reconstructing the genealogy of his theory within its historical, political, and economic context. It examines to what extent Simmel’s ideas on value and division of labor cement his conception of cultural alienation. Finally, it argues that Simmel’s cultural theory is significantly biased by a subjective theory of value, which entails apologetic traits.
Peter D. Mosses
The CBS framework supports component-based specification of programming languages. It aims to significantly reduce the effort of formal language specification, and thereby encourage language developers to exploit formal semantics more widely. CBS provides an extensive library of reusable language specification components, facilitating co-evolution of languages and their specifications. After introducing CBS and its formal definition, this short paper reports work in progress on generating an IDE for CBS from the definition. It also considers the possibility of supporting component-based language specification in other formal language workbenches.
Oscar H. Ibarra, Ian McQuillan, Bala Ravikumar
We study counting-regular languages -- these are languages $L$ for which there is a regular language $L'$ such that the number of strings of length $n$ in $L$ and $L'$ are the same for all $n$. We show that the languages accepted by unambiguous nondeterministic Turing machines with a one-way read-only input tape and a reversal-bounded worktape are counting-regular. Many one-way acceptors are a special case of this model, such as reversal-bounded deterministic pushdown automata, reversal-bounded deterministic queue automata, and many others, and therefore all languages accepted by these models are counting-regular. This result is the best possible in the sense that the claim does not hold for either $2$-ambiguous PDA's, unambiguous PDA's with no reversal-bound, and other models. We also study closure properties of counting-regular languages, and we study decidability problems in regards to counting-regularity. For example, it is shown that the counting-regularity of even some restricted subclasses of PDA's is undecidable. Lastly, $k$-slender languages -- where there are at most $k$ words of any length -- are also studied. Amongst other results, it is shown that it is decidable whether a language in any semilinear full trio is $k$-slender.
Nathanaël Fijalkow
This paper studies the complexity of languages of finite words using automata theory. To go beyond the class of regular languages, we consider infinite automata and the notion of state complexity defined by Karp. Motivated by the seminal paper of Rabin from 1963 introducing probabilistic automata, we study the (deterministic) state complexity of probabilistic languages and prove that probabilistic languages can have arbitrarily high deterministic state complexity. We then look at alternating automata as introduced by Chandra, Kozen and Stockmeyer: such machines run independent computations on the word and gather their answers through boolean combinations. We devise a lower bound technique relying on boundedly generated lattices of languages, and give two applications of this technique. The first is a hierarchy theorem, stating that there are languages of arbitrarily high polynomial alternating state complexity, and the second is a linear lower bound on the alternating state complexity of the prime numbers written in binary. This second result strengthens a result of Hartmanis and Shank from 1968, which implies an exponentially worse lower bound for the same model.
Birzhan Moldagaliyev
We define a notion of randomness for individual and collections of formal languages based on automatic martingales acting on sequences of words from some underlying domain. An automatic martingale bets if the incoming word belongs to the target language or not. Then randomness of both single languages and collections of languages is defined as a failure of automatic martingale to gain an unbounded capital by betting on the target language according to an incoming sequence of words, or a text. The randomness of formal languages turned out to be heavily dependent on the text. For very general classes of texts, any nonregular language happens to be random when considered individually. As for collections of languages, very general classes of texts permits nonrandomness of automatic families of languages only. On the other hand, an arbitrary computable language is be shown to be nonrandom under certain dynamic texts.
Alexis Linard, Colin de la Higuera, Frits Vaandrager
A classical problem in grammatical inference is to identify a language from a set of examples. In this paper, we address the problem of identifying a union of languages from examples that belong to several different unknown languages. Indeed, decomposing a language into smaller pieces that are easier to represent should make learning easier than aiming for a too generalized language. In particular, we consider k-testable languages in the strict sense (k-TSS). These are defined by a set of allowed prefixes, infixes (sub-strings) and suffixes that words in the language may contain. We establish a Galois connection between the lattice of all languages over alphabet Σ, and the lattice of k-TSS languages over Σ. We also define a simple metric on k-TSS languages. The Galois connection and the metric allow us to derive an efficient algorithm to learn the union of k-TSS languages. We evaluate our algorithm on an industrial dataset and thus demonstrate the relevance of our approach.
Kostadin Kratchanov, Efe Ergün
Control Network Programming (CNP) is a programming paradigm which is being described with the maxim "Primitives + Control Network = Control Network program". It is a type of graphic programming. The Control Network is a recursive system of graphs; it can be a purely descriptive specification of the problem being solved. Clearly, "drawing" the control network does not include any programming. The Primitives are elementary, easily understandable and clearly specified actions. Ultimately, they have to be programmed. Historically, they are usually coded in Free Pascal. The actual code of the primitives has never been considered important. The essence of an "algorithm" is represented by its control network. CNP was always meant to be an easy and fast approach for software application development that actually involves very little real programming. Language interoperability (using different languages in the same software project) is a distinguished current trend in software development. It is even more important and natural in the case of CNP than for other programming paradigms. Here, interoperability practically means the possibility to use primitives written in various programming languages. The current report describes our first steps in creating applications using a multi-language set of primitives. Most popular and interesting programming languages have been addressed: Python, Java, and C. We show how to create applications with primitives written in those "non-native" languages. We consider examples where the primitives in all those four programming languages are simultaneously used (multiple-language CNP). We also discuss CNP programming without programming (language-free CNP).
Giovanna J. Lavado, Giovanni Pighizzini, Luca Prigioniero
Finite automata whose computations can be reversed, at any point, by knowing the last k symbols read from the input, for a fixed k, are considered. These devices and their accepted languages are called k-reversible automata and k-reversible languages, respectively. The existence of k-reversible languages which are not (k-1)-reversible is known, for each k>1. This gives an infinite hierarchy of weakly irreversible languages, i.e., languages which are k-reversible for some k. Conditions characterizing the class of k-reversible languages, for each fixed k, and the class of weakly irreversible languages are obtained. From these conditions, a procedure that given a finite automaton decides if the accepted language is weakly or strongly (i.e., not weakly) irreversible is described. Furthermore, a construction which allows to transform any finite automaton which is not k-reversible, but which accepts a k-reversible language, into an equivalent k-reversible finite automaton, is presented.
Zakaria Alomari, Oualid El Halimi, Kaushik Sivaprasad et al.
Comparison of programming languages is a common topic of discussion among software engineers. Multiple programming languages are designed, specified, and implemented every year in order to keep up with the changing programming paradigms, hardware evolution, etc. In this paper we present a comparative study between six programming languages: C++, PHP, C#, Java, Python, VB ; These languages are compared under the characteristics of reusability, reliability, portability, availability of compilers and tools, readability, efficiency, familiarity and expressiveness.
Halaman 11 dari 16032