On the Complexity of Language Membership for Probabilistic Words
Antoine Amarilli, Mikaël Monet, Paul Raphaël
et al.
We study the membership problem to context-free languages L (CFLs) on probabilistic words, that specify for each position a probability distribution on the letters (assuming independence across positions). Our task is to compute, given a probabilistic word, what is the probability that a word drawn according to the distribution belongs to L. This problem generalizes the problem of counting how many words of length n belong to L, or of counting how many completions of a partial word belong to L. We show that this problem is in polynomial time for unambiguous context-free languages (uCFLs), but can be #P-hard already for unions of two linear uCFLs. More generally, we show that the problem is in polynomial time for so-called poly-slicewise-unambiguous languages, where given a length n we can tractably compute an uCFL for the words of length n in the language. This class includes some inherently ambiguous languages, and implies the tractability of bounded CFLs and of languages recognized by unambiguous polynomial-time counter automata; but we show that the problem can be #P-hard for nondeterministic counter automata, even for Parikh automata with a single counter. We then introduce classes of circuits from knowledge compilation which we use for tractable counting, and show that this covers the tractability of poly-slicewise-unambiguous languages and of some CFLs that are not poly-slicewise-unambiguous. Extending these circuits with negation further allows us to show tractability for the language of primitive words, and for the language of concatenations of two palindromes. We finally show the conditional undecidability of the meta-problem that asks, given a CFG, whether the probabilistic membership problem for that CFG is tractable or #P-hard.
Reactive Semantics for User Interface Description Languages
Basile Pesin, Celia Picard, Cyril Allignol
User Interface Description Languages (UIDLs) are high-level languages that facilitate the development of Human-Machine Interfaces, such as Graphical User Interface (GUI) applications. They usually provide first-class primitives to specify how the program reacts to an external event (user input, network message), and how data flows through the program. Although these domain-specific languages are now widely used to implement safety-critical GUIs, little work has been invested in their formalization and verification. In this paper, we propose a denotational semantic model for a core reactive UIDL, Smalite, which we argue is expressive enough to encode constructs from more realistic languages. This preliminary work may be used as a stepping stone to produce a formally verified compiler for UIDLs.
Various Types of Comet Languages and their Application in External Contextual Grammars
Marvin Ködding, Bianca Truthe
In this paper, we continue the research on the power of contextual grammars with selection languages from subfamilies of the family of regular languages. We investigate various comet-like types of languages and compare such language families to some other subregular families of languages (finite, monoidal, nilpotent, combinational, (symmetric) definite, ordered, non-counting, power-separating, suffix-closed, commutative, circular, or union-free languages). Further, we compare the language families defined by these types for the selection with each other and with the families of the hierarchy obtained for external contextual grammars. In this way, we extend the existing hierarchy by new language families.
Image Caption Synthesis for Low Resource Assamese Language using Bi-LSTM with Bilinear Attention
Pankaj Choudhury, Prithwijit Guha, Sukumar Nandi
5 sitasi
en
Computer Science
Modelling admixture across language levels to evaluate deep history claims
Nataliia Hübler, Simon J. Greenhill
The so-called ‘Altaic’ languages have been subject of debate for over 200 years. An array of different data sets have been used to investigate the genealogical relationships between them, but the controversy persists. The new data with a high potential for such cases in historical linguistics are structural features, which are sometimes declared to be prone to borrowing and discarded from the very beginning and at other times considered to have an especially precise historical signal reaching further back in time than other types of linguistic data. We investigate the performance of typological features across different domains of language by using an admixture model from genetics. As implemented in the software STRUCTURE, this model allows us to account for both a genealogical and an areal signal in the data. Our analysis shows that morphological features have the strongest genealogical signal and syntactic features diffuse most easily. When using only morphological structural data, the model is able to correctly identify three language families: Turkic, Mongolic, and Tungusic, whereas Japonic and Koreanic languages are assigned the same ancestry.
Exploring BERT Models for Part-of-Speech Tagging in the Algerian Dialect: A Comprehensive Study
M. A. Chéragui, Abdelhalim Hafedh Dahou, Amin Abdedaiem
3 sitasi
en
Computer Science
Ilukirjanduse lugemine mängustatud aktiivõppemeetodite abil põhikooli eesti keele kui teise keele tundides
Mare Kitsnik, Svetlana Melnikova
"Reading fiction using gamified active learning methods in eighth grade Estonian as a second language classes".
Acquiring a good level of Estonian as a second language is very important for the students at secondary school to successfully continue their studies and to smoothly integrate into the Estonian society. To encourage students’ desire to learn and increase the effectiveness of learning, activity, engagement, and affordability are very important during language classes (Kitsnik 2018b). Reading is one of four skills that is developed at second language classes. The aim of current action research was to develop more effective learning where gamified teaching methods were used. Based on the aim of the research, there were raised research question, whether more interesting and systematic reading classes can be used to make reading books in Estonian more pleasant for students with other mother tongues and to support the development of language skills.
In the action research the previous reading classes were developed methodically of one of the study groups in grade 8 (n 15). 15 consciously structured reading classes were compiled based on Andrus Kivirähk’s book ”Tont ja Facebook” (”The Ghost and Facebook” 2019). The plan was made on the basis of gamified teaching methods (Kitsnik 2019a, 2019b; Razin & Kingisepp 2018; Kingisepp & Kärtner 2011 etc.). Gamification offers challenges, creates excitement, frees students from the rules and restrictions of everyday life, offers fun and energy (Kitsnik 2019a).
Functions of the conjunction “ne…ne…” in the Kyrgyz Turkic based on the example of Tologon Kasımbekov’s works
Meder SALİEV
In this study, it was emphasized the usage characteristics of the conjunction “ne…ne…”
in Kyrgyz language. Resources that we have examined were stated that the conjunction
“ne…ne…” is not commonly utilized in the Kyrgyz written language, but it is mostly used in
the southern dialects of Kyrgyz language. Since there are no examples of the conjunction
“ne…ne…” in the new written texts in Kyrgyz language. Due to this, in this study there were
determined sentences in which the conjunction “ne…ne…” which is commonly used in two
historical novels named “Kelkel” and “Sıngan Kılıç” written by Tölögön Kasımbekov. When it
was analyzed in the sentences that have identified the conjunction “ne…ne…” besides the
meaning of “biri ”(one) and “hepsi”(all), the elements compared in the sentence which was
seen that it is also used in the meaning of “hem …..hem” (both….and), “da…de…” (also),
“ya……ya” (either….or ) and “mı … mı” (question conjunction). In addition, brief information
was given also on the evaluations, in the studies on the usage and origin of the conjunction
“ne”. In these studies, some researchers have evaluated about the origin of the conjunction as
Persian origin, while some researchers have appraised it as Turkic. Yudahin stated that two
different “ne…ne…” conjunctions are used in Kyrgyz language, one of them is the “ne”
conjunction from Persian, and the other is the “ne” conjunction, which means “ne…ne”
(either……or) in Turkic. Radloff, on one hand, stated that the predicate is used with the
negation suffix at the end of the sentences in which the conjunction “ne…ne…” is used in the
northern dialects of Turkic, on the other hand he stated that in southern dialects, as in Persian,
the predicate of sentences in which the conjunction “ne…ne…” is utilized in a positive
meaning.
Murataliyev stated that the origin of this conjunction is Persian, but also stated that it
was transferred to Kyrgyz language via Uzbek and Tatar languages. Based on this information,
after examining the sentences in which the conjunction “ne…ne…” is used in the novels, it has
been determined that most of the predicates have a negative suffix. There, Deny and Ediskun's
views are given extensively in the continuation of the study on whether the verb can be either
positive or negative in sentences with the conjunction “ne…ne…”
Language and Literature, Ural-Altaic languages
Looduskultuuri mur(d)epunkte. Meie antropotseen
Elle-Mari Talivee
The article compares birdwatchers’ experience of nature with the natural environment as conveyed in literary fiction. The source materials comprise, first of all, interviews conducted with (mostly amateur) Estonian birdwatchers and, secondly, contemporary Estonian and Swedish literary fiction: Maarja Pärtna’s prose poetry collection “The Living City” (Elav linn, 2022), Andrus Kivirähk’s book “Flight to the Moon” (Lend Kuule, 2022), Tõnis Tootsen’s novel “Pâté of the Apes: One Primate’s Thoughts and Memories” (Ahvide pasteet, 2022) and Kerstin Ekman’s novel “The Wolf Run” (Löpa varg, 2021; Estonian translation 2022). The focus is on whether and how the concerns of nature observers relate to anxiety about changes in the natural environment as expressed in contemporary literature. Amateur environmental science projects aim to draw attention to concerns about the natural environment and climate change and thereby strive for a smaller personal environmental impact. Eco-fiction, in turn, puts environmental issues into words, setting them into a fathomable, although perhaps altogether unexpected scale: hence, eco-prose and -poesy are essential ways of perceiving that also serve to interpret the ongoing changes.
Environment-oriented literary culture has responded to issues with the natural environment before. Now, too, it can be concluded that environmental concerns have made a forceful entry into literature. All of the above-mentioned authors have found their unique way of conceptualizing our home in an era of environmental crises. Their recently published works tell stories about our surroundings and interpret the present situation; they discuss the anthropocentric viewpoint or depict the human focus from an unexpected perspective; they draw attention to our alienation from nature; they reposition the reader and thereby seek solutions to environmental issues. Among other things, they highlight environmentally friendly ways of living and experiencing the world or look at the world through non-human eyes, thus bringing the narrator closer to other forms of being. The writers share with birdwatchers the post-humanist idea of the equality of species as well as a sharp eye for their subject.
Other Finnic languages and dialects
Examination of some verbs that are used continuously in Bekilli disrict of Denizli province
Olcay Güntülü ÖZAKAYDIN
Bekilli district of Denizli province is located in the northern part of the city. Bekilli,
which was a part of Çal district until 1987, gained district status after this date. In the 13th
century, among the Oghuz tribes, the Kayı tribe, the Avşar tribe, the Yazır tribe, the Bayat tribe,
the Beydilli tribe, the Çavuldur tribe and the Eymir tribe were settled in the Bekilli, Çal, Çivril
region. As a result, Oghuz Turkish is dominant in this region. As a result of the researches,
Denizli province and its districts are located within the borders of the Western Group Dialects.
The phonetic and morphological features of the Western Group Dialects are largely observed in
the daily spoken language of the Bekilli people. It is seen that some of the verbs that are used
frequently in daily life are said and written in the form specific to this region. In fact, it has been
determined that these verbs are either not included in the Derleme Sözlüğü and Tarama Sözlüğü
or they take place in different ways. When we examine the root and suffix parts of these verbs,
it has been determined that they have existed in our language for a long time, but their usage
areas are limited to rural areas such as Bekilli. For example; The verb "hamaşmak" is a verb that
is frequently used in this district in the sense of "hug, embrace". However, this verb does not
appear in the Derleme Sözlüğü.
Language and Literature, Ural-Altaic languages
Capital-abdulhamid or impressions of uzbek poet zevkî from istanbul travel
Murad Halmet
During the reign of Yavuz Sultan Selim, İstanbul became the center of the caliphate,
and the pilgirimage route of the Turkestanis was turned to İstanbul. They stayed here for a while
and visited the caliph sultan as much as possible and showed respect. The road safety of the
pilgrims coming from Turkestan was provided by the Ottoman state. Therefore, pilgrims were
taken to pilgrimage accompanied by Ottoman Turkic soldiers. At the beginning of the 20th
century, one of those who went on a pilgrimage through İstanbul was the Uzbek Zevkî. As a
tradition, during the pilgrimage, Zevkî first came to İstanbul, where the caliphate was located,
and stayed there for a while. During his stay, he visited İstanbul, which overlooks the
Bosphorus, and was fascinated by its beauties. Later, he wrote a poem and presented the
beutiful landscapes and cases he saw to the people of Turkestan. The poem in question is of
great importance in terms of introducing İstanbul, the capital of Western Turkic peoples through
description. The poet painted a portrait of İstanbul with words, almost like painter. The subjects
reflected in the poem are the beauties and the nature of the capital İstanbul, the mosques in the
city, as well as the ruler of the period, the caliph of the Muslim peoples, Sultan Abdulhamid
Han II; his works which he did for the development of his homeland, nation and Islam, his
relationship with his people, and loving him by his people.
Language and Literature, Ural-Altaic languages
Elu nagu algebra. Alide Erteli elu ja looming
Taimi Grauberg
Alide Ertel (1877–1955) was an Estonian woman writer active in the early 20th century. The most significant factors influencing her creative path were being born into a wealthy South Estonian family of farmers, her good education, and traveling not only within the Tsarist Empire but also in Western Europe. The active participation of Ertel’s family in the public life also played an important part. Ertel herself was involved in politics, taking part in the 1905 revolution as well as the events of 1917, which can be considered important factors shaping Ertel’s life, work, and its reception. In addition to giving fiery speeches during the revolutionary events, Ertel took a strong stance on issues like popular education, agriculture, and the economic well-being of cultural figures.
At the beginning of her creative career, Ertel published her works primarily in the print media. 1910 saw the publication of her debut novella Rooste (“Rust”), which depicted the residents and conditions of a local poorhouse. Rooste received favourable reviews and is considered, according to later assessments, Ertel’s best work. From 1919 to 1920, Ertel dedicated herself to literary work and produced two plays, a collection of short stories published in two editions, a collection of aphorisms, and a historical novel within a short period of time. All works from this period had a negative reception, which can be attributed to inadequate linguistic editing, going against the literary circles of the time, and a preconceived bias stemming from this opposition as well as Ertel’s association with Bolshevism. The complex publishing market situation during that time must also be considered. Consequently, Ertel withdrew from literary activities. Between 1929 and 1931, she published a book of fairy tales and two plays, but these works also failed to garner the attention she hoped for.
Although Ertel’s creative work has remained lodged in the time of its publication, it is worth exploring also for the contemporary reader. Ertel has depicted the aspirations of marginalized members of society and focused heightened attention on the position of women in the society and their possibilities for self-actualization.
Other Finnic languages and dialects
Büchi-like characterizations for Parikh-recognizable omega-languages
Mario Grobler, Sebastian Siebertz
Büchi's theorem states that $ω$-regular languages are characterized as languages of the form $\bigcup_i U_i V_i^ω$, where $U_i$ and $V_i$ are regular languages. Parikh automata are automata on finite words whose transitions are equipped with vectors of positive integers, whose sum can be tested for membership in a given semi-linear set. We give an intuitive automata theoretic characterization of languages of the form $U_i V_i^ω$, where $U_i$ and $V_i$ are Parikh-recognizable. Furthermore, we show that the class of such languages, where $U_i$ is Parikh-recognizable and $V_i$ is regular is exactly captured by a model proposed by Klaedtke and Ruess [Automata, Languages and Programming, 2003], which again is equivalent to (a small modification of) reachability Parikh automata introduced by Guha et al. [FSTTCS, 2022]. We finish this study by introducing a model that captures exactly such languages for regular $U_i$ and Parikh-recognizable $V_i$.
Regular Methods for Operator Precedence Languages
Thomas A. Henzinger, Pavol Kebis, Nicolas Mazzocchi
et al.
The operator precedence languages (OPLs) represent the largest known subclass of the context-free languages which enjoys all desirable closure and decidability properties. This includes the decidability of language inclusion, which is the ultimate verification problem. Operator precedence grammars, automata, and logics have been investigated and used, for example, to verify programs with arithmetic expressions and exceptions (both of which are deterministic pushdown but lie outside the scope of the visibly pushdown languages). In this paper, we complete the picture and give, for the first time, an algebraic characterization of the class of OPLs in the form of a syntactic congruence that has finitely many equivalence classes exactly for the operator precedence languages. This is a generalization of the celebrated Myhill-Nerode theorem for the regular languages to OPLs. As one of the consequences, we show that universality and language inclusion for nondeterministic operator precedence automata can be solved by an antichain algorithm. Antichain algorithms avoid determinization and complementation through an explicit subset construction, by leveraging a quasi-order on words, which allows the pruning of the search space for counterexample words without sacrificing completeness. Antichain algorithms can be implemented symbolically, and these implementations are today the best-performing algorithms in practice for the inclusion of finite automata. We give a generic construction of the quasi-order needed for antichain algorithms from a finite syntactic congruence. This yields the first antichain algorithm for OPLs, an algorithm that solves the \textsc{ExpTime}-hard language inclusion problem for OPLs in exponential time.
LPR: Large Language Models-Aided Program Reduction
Mengxiao Zhang, Yongqiang Tian, Zhenyang Xu
et al.
Program reduction is a prevalent technique to facilitate compilers' debugging by automatically minimizing bug-triggering programs. Existing program reduction techniques are either generic across languages (e.g., Perses and Vulcan) or specifically customized for one certain language by employing language-specific features, like C-Reduce. However, striking the balance between generality across multiple programming languages and specificity to individual languages in program reduction is yet to be explored. This paper proposes LPR, the first technique utilizing LLMs to perform language-specific program reduction for multiple languages. The core insight is to utilize both the language-generic syntax level program reduction (e.g., Perses) and the language-specific semantic level program transformations learned by LLMs. Alternately, language-generic program reducers efficiently reduce programs into 1-tree-minimality, which is small enough to be manageable for LLMs; LLMs effectively transform programs via the learned semantics to expose new reduction opportunities for the language-generic program reducers to further reduce the programs. Our extensive evaluation on 50 benchmarks across three languages (C, Rust, and JavaScript) has highlighted LPR's practicality and superiority over Vulcan, the state-of-the-art language-generic program reducer. For effectiveness, LPR surpasses Vulcan by producing 24.93%, 4.47%, and 11.71% smaller programs on benchmarks in C, Rust and JavaScript. Moreover, LPR and Vulcan have demonstrated their potential to complement each other. By using Vulcan on LPR's output for C programs, we achieve program sizes comparable to those reduced by C-Reduce. For efficiency, LPR takes 10.77%, 34.88%, 36.96% less time than Vulcan to finish all benchmarks in C, Rust and JavaScript, separately.
Swap distance minimization in SOV languages. Cognitive and mathematical foundations
Ramon Ferrer-i-Cancho, Savithry Namboodiripad
Distance minimization is a general principle of language. A special case of this principle in the domain of word order is swap distance minimization. This principle predicts that variations from a canonical order that are reached by fewer swaps of adjacent constituents are lest costly and thus more likely. Here we investigate the principle in the context of the triple formed by subject (S), object (O) and verb (V). We introduce the concept of word order rotation as a cognitive underpinning of that prediction. When the canonical order of a language is SOV, the principle predicts SOV < SVO, OSV < VSO, OVS < VOS, in order of increasing cognitive cost. We test the prediction in three flexible order SOV languages: Korean (Koreanic), Malayalam (Dravidian), and Sinhalese (Indo-European). Evidence of swap distance minimization is found in all three languages, but it is weaker in Sinhalese. Swap distance minimization is stronger than a preference for the canonical order in Korean and especially Malayalam.
Foreword / Eessõna / Eḑḑisõnā
Valts Ernštreits, Karl Pajusalu
Philology. Linguistics, Finnic. Baltic-Finnic
BADIIY MATNDA UCHRAYDIGAN G‘AYRIODATIY BIRIKMALARNING LINGVOPOETIK XUSUSIYATLARI
Munira Akbarova
Maqolaning muqaddimasida adabiy asar va til o‘rtasidagi munosabatga alohida urg‘u beriladi. Adabiy asarni oddiy matndan ajratib turuvchi eng muhim xususiyat uning badiiy tilidir. Adabiyot so‘z san’atidir. Adabiy asar ham barcha matnlar kabi so‘zlardan iborat bo‘ladi, lekin adabiy asarni tashkil etuvchi so‘zlar shoir yoki muallifning badiiy va estetik tushunchasini aks ettirish uchun birlashtiriladi va shu tariqa adabiy asar boshqa matnlardan ajratiladi. Har bir adabiy asar ma’lum darajada o‘zi yozilgan “til”ning estetik darajasini aks ettiradi. Badiiy matnning ma’no teranligi, badiiy saviyasi va estetik darajasi undagi g‘ayrioddiy ma’no birlashmalari va so‘z turkumlarining tez tezligi bilan bir oz bog‘liq. Muallifning badiiy matnda noodatiy semantik assotsiatsiyalar va so‘z turkumlaridan foydalanishi matnni o‘quvchiga yanada jozibador qiladi. Shu bilan birga, kitobxonning tasavvur doirasini kengaytiradi, dunyoqarashini ochadi, o‘quvchini faollashtiradi. Chunki muallif hamma ishlatadigan so‘zlarni ishlatib, nima demoqchi ekanligini aniq aytmagan; noan'anaviy ifodalar orqasida yashiringan. Uni ochish o‘quvchining o‘ziga bog‘liq. Bunday asarni o'qigan odamning ongi butun o'qish davomida doimiy ishlamog'i kerak.Maqolaning davomida 20-asr oʻzbek adabiyoti namoyandalari boʻlgan Said Ahmad, Tohir Malik, Oʻtkir Hoshimovlar ijodidan noodatiy maʼno birlashmalari va soʻz turkumlariga misollar keltirilib, izohlanadi.
Rust: The Programming Language for Safety and Performance
William Bugden, Ayman Alahmar
Rust is a young programming language gaining increased attention from software developers since it was introduced to the world by Mozilla in 2010. In this study, we attempt to answer several research questions. Does Rust deserve such increased attention? What is there in Rust that is attracting programmers to this new language? Safety and performance were among the very first promises of Rust, as was claimed by its early developers. Is Rust a safe language with high performance? Have these claims been achieved? To answer these questions, we surveyed and analyzed recent research on Rust and research that benchmarks Rust with other available prominent programming languages. The results show that Rust deserves the increased interest by programmers, and recent experimental results in benchmarking research show Rust's overall superiority over other well-established languages in terms of performance, safety, and security. Even though this study was not comprehensive (and more work must be done in this area), it informs the programming and research communities on the promising features of Rust as the language of choice for the future.
Formally Verified Native Code Generation in an Effectful JIT -- or: Turning the CompCert Backend into a Formally Verified JIT Compiler
Aurèle Barrière, Sandrine Blazy, David Pichardie
Modern Just-in-Time compilers (or JITs) typically interleave several mechanisms to execute a program. For faster startup times and to observe the initial behavior of an execution, interpretation can be initially used. But after a while, JITs dynamically produce native code for parts of the program they execute often. Although some time is spent compiling dynamically, this mechanism makes for much faster times for the remaining of the program execution. Such compilers are complex pieces of software with various components, and greatly rely on a precise interplay between the different languages being executed, including on-stack-replacement. Traditional static compilers like CompCert have been mechanized in proof assistants, but JITs have been scarcely formalized so far, partly due to their impure nature and their numerous components. This work presents a model JIT with dynamic generation of native code, implemented and formally verified in Coq. Although some parts of a JIT cannot be written in Coq, we propose a proof methodology to delimit, specify and reason on the impure effects of a JIT. We argue that the daunting task of formally verifying a complete JIT should draw on existing proofs of native code generation. To this end, our work successfully reuses CompCert and its correctness proofs during dynamic compilation. Finally, our prototype can be extracted and executed.