Turn Complexity of Context-free Languages, Pushdown Automata and One-Counter Automata
Giovanni Pighizzini
A turn in a computation of a pushdown automaton is a switch from a phase in which the height of the pushdown store increases to a phase in which it decreases. Given a pushdown or one-counter automaton, we consider, for each string in its language, the minimum number of turns made in accepting computations. We prove that it cannot be decided if this number is bounded by any constants. Furthermore, we obtain a non-recursive trade-off between pushdown and one-counter automata accepting in a finite number of turns and finite-turn pushdown automata, that are defined requiring that the constant bound is satisfied by each accepting computation. We prove that there are languages accepted in a sublinear but not constant number of turns, with respect to the input length. Furthermore, there exists an infinite proper hierarchy of complexity classes, with the number of turns bounded by different sublinear functions. In addition, there is a language requiring a number of turns which is not constant but grows slower than each of the functions defining the above hierarchy.
Graph Rewriting Language as a Platform for Quantum Diagrammatic Calculi
Kayo Tei, Haruto Mishina, Naoki Yamamoto
et al.
Systematic discovery of optimization paths in quantum circuit simplification remains a challenge. Today, ZX-calculus, a computing model for quantum circuit transformation, is attracting attention for its highly abstract graph-based approach. Whereas existing tools such as PyZX and Quantomatic offer domain-specific support for quantum circuit optimization, visualization and theorem-proving, we present a complementary approach using LMNtal, a general-purpose hierarchical graph rewriting language, to establish a diagrammatic transformation and verification platform with model checking. Our methodology shows three advantages: (1) manipulation of ZX-diagrams through native graph transformation rules, enabling direct implementation of basic rules; (2) quantified pattern matching via QLMNtal extensions, greatly simplifying rule specification; and (3) interactive visualization and validation of optimization paths through state space exploration. Through case studies, we demonstrate how our framework helps understand optimization paths and design new algorithms and strategies. This suggests that the declarative language LMNtal and its toolchain could serve as a new platform to investigate quantum circuit transformation from a different perspective.
Keelekorpus kui leksikograafi abiline kõnekeelsuse tuvastamisel
Lydia Risberg, Maria Tuulik, Margit Langemets
et al.
Using corpus data to support lexicographers in identifying informal language
This study examines how new corpus analysis tools can assist lexicographers in determining whether to assign a word an informal register label in a dictionary. Labelling words in dictionaries is necessary for language users seeking register information. Moreover, there have been calls for the upcoming Dictionary of Standard Estonian (DSE, 2025) to clearly distinguish standard language from other linguistic varieties.
Informal language was chosen for analysis because it is more difficult to define than other marked registers. In DSE 2018, some words were labelled as informal based on language planning decisions rather than empirical analysis. As register labels should be data-driven and based on corpus evidence, a systematic review of these words is necessary for the revised edition.
Our study investigates how corpus genre data can support lexicographers in deciding whether to add or remove the informal label. We found that corpus data provided useful insights in 82.1% of cases. Based on our experiment, we developed a guideline to assist in labelling word meanings as informal. Namely, if a word occurs in blogs and forums in 36% or more of its total corpus occurrences, it may be considered as tending towards informal usage. This guideline is not a rigid rule but a supportive tool, as additional factors should be considered based on the lexicographer’s linguistic expertise.
Users value reliable linguistic information in dictionaries. Our proposed guideline helps lexicographers make more systematic decisions while maintaining expert judgment as the ultimate determinant.
Other Finnic languages and dialects
Kolonisatsioon ja kohanimed. Abhaasia eestlaste toponüümiast
Aivar Jürgenson
The Estonian villages of Salme, Sulevi, Estonia, and Linda were established along the Black Sea coast of the Caucasus in the 1880s. This was part of a broader migration movement that began in the mid-19th century, following the implementation of peasant laws and passport reforms that allowed peasants to leave their home provinces. Key push factors included demographic transition, overpopulation, and land shortages, while Russian imperial policies encouraged colonization in the southern and eastern regions of the empire. Estonians settled in Abkhazia after the Russo-Turkish War (1877–1878), a period during which much of the local population was exiled to the Ottoman Empire. The process of colonization involved the renaming of places, a practice undertaken both by central government officials and the settlers themselves. This article examines how Estonians named their new settlements and the ideological considerations that shaped these naming practices. The colonists drew inspiration from several sources, including place names from their homeland (such as the village name Estonia and farm names), features of the local environment, and figures from Estonian pseudomythology (e.g., Salme, Sulevi, Linda), which was highly popular during Estonia’s national awakening movement at the time. The microtoponymy created by Estonians reflected practical needs to designate key locations for daily life, such as mountains, fields, and forests. In general, the settlers disregarded the preexisting toponymy of the land, especially in the villages of Salme and Sulevi in northwestern Abkhazia, where the indigenous population had been forcibly removed as early as the 1860s – two decades before the arrival of Estonian migrants. As a result, the names given by Estonians also reflect the cultural rupture of the colonial era.
Other Finnic languages and dialects
Regilaulu variatsioonid tänapäeva Eestis. Koodi jätkamine
Taive Särg, Janika Oras
Variations of runosong in contemporary Estonia: Continuing a code of singing
This article provides an overview of the contemporary, largely revitalized, and multi-layered runosong tradition within the context of its historical development, showing how the ancient Finnic song heritage – nearly extinct by the end of the 19th century – began to revive in the second half of the 20th century through earlier documentation, surviving peripheral traditions, and the postmodern re-evaluation of folk music.
The focus lies on the social aspects of the 21st-century runosong tradition – its functions, contexts, and social organization. Contemporary runosong performances can be grouped into: (1) tradition-related singing integrated into ritual or other functional contexts; (2) non-ritual participatory singing with alternating lead and responding chorus; (3) unarranged stage performances, often involving audience participation; and (4) arranged performances that merge runosong with other musical styles. These forms influence one another and often draw on shared song sources.
Runosong singing represents an alternative to mainstream modern culture and therefore often serves as a vehicle of identity and expression for smaller communities and nations – particularly those centred on the preservation of their culture, language, and environment. As a participatory form of music-making, runosong offers opportunities for distinctive self-expression and aesthetic experience, for transformative or healing engagement, as accompaniment to rituals and movements, and as a means of broadening cultural horizons.
As the expressive form and content of runosong have developed in close connection with various aspects of everyday life over a long period, the style has retained its ability to adapt to changing conditions. Thus, runosong can be understood as a code – a framework shaped by its performers and tradition-bearers, characterized by variable structural and semantic features and capable of conveying multiple layers of meaning. Its presence in both participatory and staged forms demonstrates the vitality and continued significance of singing traditions in contemporary Estonia.
Other Finnic languages and dialects
Folklore style in Azerbaijani children’s literature of the early 20th century
XURAMAN
In the early 20th century, the enlightenment movement in Azerbaijan expanded
significantly, providing a strong impetus for the emergence and development of children’s
literature in the country. The educators and writers of that period made invaluable contributions
to the process of national self-consciousness through their literary creativity. By drawing on
various genres and themes from Azerbaijani folklore, they created some of the most notable
examples of children’s literature. Enduring exemplars of national children’s poetry—rooted in
oral folk traditions—began to emerge during this period. Traditional genres such as tapmaca
(riddles), layla, sanama (counting-out rhyme), duzgu, yanıltmac (alliteration), and qaravalli
were among the forms most frequently employed by children’s literature authors. Epics, folk
tales, legends, and narratives provided the best thematic source material for their creative works.
Each of these works continues to attract young readers today in terms of language and style,
form and content, rhythm, as well as overall appeal. At that time, there were virtually no literary
works available for classroom use in Azerbaijani schools that met pedagogical requirements and
resonated with the interests and thought processes of children. The educators and writers of the
period primarily composed the first children’s works for use in the educational process. These
literary examples predominantly address moral and didactic themes, promote national and
spiritual values, and contribute to the enrichment of one’s worldview. Mikayıl Müşfiq, Abdulla
Şaiq, Abbas Səhhət, Mirzə Ələkbər Sabir, and others, are counted among the authors of the
earliest children’s works written in the folklore style.
Language and Literature, Ural-Altaic languages
Polymorphic Records for Dynamic Languages
Giuseppe Castagna, Loïc Peyrot
We define and study "row polymorphism" for a type system with set-theoretic types, specifically union, intersection, and negation types. We consider record types that embed row variables and define a subtyping relation by interpreting types into sets of record values and by defining subtyping as the containment of interpretations. We define a functional calculus equipped with operations for field extension, selection, and deletion, its operational semantics, and a type system that we prove to be sound. We provide algorithms for deciding the typing and subtyping relations. This research is motivated by the current trend of defining static type system for dynamic languages and, in our case, by an ongoing effort of endowing the Elixir programming language with a gradual type system.
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana
et al.
Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural contexts, respect local sensitivities, and support low-resource languages, all while effectively integrating corresponding visual cues. In pursuit of culturally diverse global multimodal models, our proposed All Languages Matter Benchmark (ALM-bench) represents the largest and most comprehensive effort to date for evaluating LMMs across 100 languages. ALM-bench challenges existing models by testing their ability to understand and reason about culturally diverse images paired with text in various languages, including many low-resource languages traditionally underrepresented in LMM research. The benchmark offers a robust and nuanced evaluation framework featuring various question formats, including true/false, multiple choice, and open-ended questions, which are further divided into short and long-answer categories. ALM-bench design ensures a comprehensive assessment of a model's ability to handle varied levels of difficulty in visual and linguistic reasoning. To capture the rich tapestry of global cultures, ALM-bench carefully curates content from 13 distinct cultural aspects, ranging from traditions and rituals to famous personalities and celebrations. Through this, ALM-bench not only provides a rigorous testing ground for state-of-the-art open and closed-source LMMs but also highlights the importance of cultural and linguistic inclusivity, encouraging the development of models that can serve diverse global populations effectively. Our benchmark is publicly available.
Overview of the First Workshop on Language Models for Low-Resource Languages (LoResLM 2025)
Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson
et al.
The first Workshop on Language Models for Low-Resource Languages (LoResLM 2025) was held in conjunction with the 31st International Conference on Computational Linguistics (COLING 2025) in Abu Dhabi, United Arab Emirates. This workshop mainly aimed to provide a forum for researchers to share and discuss their ongoing work on language models (LMs) focusing on low-resource languages, following the recent advancements in neural language models and their linguistic biases towards high-resource languages. LoResLM 2025 attracted notable interest from the natural language processing (NLP) community, resulting in 35 accepted papers from 52 submissions. These contributions cover a broad range of low-resource languages from eight language families and 13 diverse research areas, paving the way for future possibilities and promoting linguistic inclusivity in NLP.
Statically Contextualizing Large Language Models with Typed Holes
Andrew Blinn, Xiang Li, June Hyung Kim
et al.
Large language models (LLMs) have reshaped the landscape of program synthesis. However, contemporary LLM-based code completion systems often hallucinate broken code because they lack appropriate context, particularly when working with definitions not in the training data nor near the cursor. This paper demonstrates that tight integration with the type and binding structure of a language, as exposed by its language server, can address this contextualization problem in a token-efficient manner. In short, we contend that AIs need IDEs, too! In particular, we integrate LLM code generation into the Hazel live program sketching environment. The Hazel Language Server identifies the type and typing context of the hole being filled, even in the presence of errors, ensuring that a meaningful program sketch is always available. This allows prompting with codebase-wide contextual information not lexically local to the cursor, nor necessarily in the same file, but that is likely to be semantically local to the developer's goal. Completions synthesized by the LLM are then iteratively refined via further dialog with the language server. To evaluate these techniques, we introduce MVUBench, a dataset of model-view-update (MVU) web applications. These applications serve as challenge problems due to their reliance on application-specific data structures. We find that contextualization with type definitions is particularly impactful. After introducing our ideas in the context of Hazel we duplicate our techniques and port MVUBench to TypeScript in order to validate the applicability of these methods to higher-resource languages. Finally, we outline ChatLSP, a conservative extension to the Language Server Protocol (LSP) that language servers can implement to expose capabilities that AI code completion systems of various designs can use to incorporate static context when generating prompts for an LLM.
Zıtlık İfadeleri ile Kurulan Dobruca Kırım Tatar Atasözleri
Serkan Akın
Türk Dünyası atasözlerinin derlenip bir araya getirilmesine yönelik çalışmalar önemli faaliyetlerdir. Bu çalışmada Dobruca bölgesinde meskûn Kırım Tatarlarına ait atasözleri incelenmiştir. Söz konusu atasözlerinin yapısında zıtlık ifadelerinin hangi yollarla meydana getirildiğinin ve hangi yoğunlukta kullanıldığının tespit edilmesi amaçlanmıştır. Çalışmada art zamanlı bir yöntem kullanılmamış, derlem olarak kabul edilen bir atasözü kitabının incelemesinden yola çıkılarak sonuca varılmıştır. Sınıflandırmada zıtlığı meydana getiren unsurların semantik ilişki biçimleri esas alınmıştır. Bilişsel anlam örüntülerinin ortaya çıkarılması amaçlandığı için zıtlığı oluşturan unsurların dil bilgisel özellikleri sınıflandırmada dikkate alınmamıştır. İnceleme sonucunda Dobruca bölgesinde yaşayan Kırım Tatar Türklerinin atasözlerinin yapısında zıtlık kavramının önemli bir olgu olduğu sonucuna ulaşılmıştır.
Proceedings of the 18th International Workshop on Logical Frameworks and Meta-Languages: Theory and Practice
Alberto Ciaffaglione, Carlos Olarte
Logical frameworks and meta-languages form a common substrate for representing, implementing and reasoning about a wide variety of deductive systems of interest in logic and computer science. Their design, implementation and their use in reasoning tasks, ranging from the correctness of software to the properties of formal systems, have been the focus of considerable research over the last two decades. This workshop brings together designers, implementors and practitioners to discuss various aspects impinging on the structure and utility of logical frameworks, including the treatment of variable binding, inductive and co-inductive reasoning techniques and the expressiveness and lucidity of the reasoning process.
Polymorphic Type Inference for Dynamic Languages
Giuseppe Castagna, Mickaël Laurent, Kim Nguyen
We present a type system that combines, in a controlled way, first-order polymorphism with intersectiontypes, union types, and subtyping, and prove its safety. We then define a type reconstruction algorithm that issound and terminating. This yields a system in which unannotated functions are given polymorphic types(thanks to Hindley-Milner) that can express the overloaded behavior of the functions they type (thanks tothe intersection introduction rule) and that are deduced by applying advanced techniques of type narrowing(thanks to the union elimination rule). This makes the system a prime candidate to type dynamic languages.
Suspension Analysis and Selective Continuation-Passing Style for Universal Probabilistic Programming Languages
Daniel Lundén, Lars Hummelgren, Jan Kudlicka
et al.
Universal probabilistic programming languages (PPLs) make it relatively easy to encode and automatically solve statistical inference problems. To solve inference problems, PPL implementations often apply Monte Carlo inference algorithms that rely on execution suspension. State-of-the-art solutions enable execution suspension either through (i) continuation-passing style (CPS) transformations or (ii) efficient, but comparatively complex, low-level solutions that are often not available in high-level languages. CPS transformations introduce overhead due to unnecessary closure allocations -- a problem the PPL community has generally overlooked. To reduce overhead, we develop a new efficient selective CPS approach for PPLs. Specifically, we design a novel static suspension analysis technique that determines parts of programs that require suspension, given a particular inference algorithm. The analysis allows selectively CPS transforming the program only where necessary. We formally prove the correctness of the analysis and implement the analysis and transformation in the Miking CorePPL compiler. We evaluate the implementation for a large number of Monte Carlo inference algorithms on real-world models from phylogenetics, epidemiology, and topic modeling. The evaluation results demonstrate significant improvements across all models and inference algorithms.
Enhancing Translation for Indigenous Languages: Experiments with Multilingual Models
Atnafu Lambebo Tonja, Hellina Hailu Nigatu, Olga Kolesnikova
et al.
This paper describes CIC NLP's submission to the AmericasNLP 2023 Shared Task on machine translation systems for indigenous languages of the Americas. We present the system descriptions for three methods. We used two multilingual models, namely M2M-100 and mBART50, and one bilingual (one-to-one) -- Helsinki NLP Spanish-English translation model, and experimented with different transfer learning setups. We experimented with 11 languages from America and report the setups we used as well as the results we achieved. Overall, the mBART setup was able to improve upon the baseline for three out of the eleven languages.
Automatic Alignment in Higher-Order Probabilistic Programming Languages
Daniel Lundén, Gizem Çaylak, Fredrik Ronquist
et al.
Probabilistic Programming Languages (PPLs) allow users to encode statistical inference problems and automatically apply an inference algorithm to solve them. Popular inference algorithms for PPLs, such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC), are built around checkpoints -- relevant events for the inference algorithm during the execution of a probabilistic program. Deciding the location of checkpoints is, in current PPLs, not done optimally. To solve this problem, we present a static analysis technique that automatically determines checkpoints in programs, relieving PPL users of this task. The analysis identifies a set of checkpoints that execute in the same order in every program run -- they are aligned. We formalize alignment, prove the correctness of the analysis, and implement the analysis as part of the higher-order functional PPL Miking CorePPL. By utilizing the alignment analysis, we design two novel inference algorithm variants: aligned SMC and aligned lightweight MCMC. We show, through real-world experiments, that they significantly improve inference execution time and accuracy compared to standard PPL versions of SMC and MCMC.
„See inimene oskab näha! Oskab kirjutada!”. Stalini preemia kui kirjanduselu nõukogustamise vahend Hans Leberechti näitel
Tõnu Tannberg
"“The man can see! Can write!”: The Stalin Prize as a tool for the
Sovietization of literary life on the example on Hans Leberecht." The Stalin Prize (established in 1939) was awarded in two broad fields: for (1) groundbreaking scientific achievements and inventions, and (2) outstanding literary and artistic achievements. The procedure for selecting and nominating the candidates and making the final decision was overseen by the party apparatus and by Joseph Stalin personally. The prize was an instrument of exerting control over intellectual life as well as an important link in the system of social etiquette (recognition, perks and privilege) of the time. Especially noteworthy was the prize’s role in the introduction of the creative mode (so-called socialist realism) favoured by the regime.
The prize bestowed for literary and artistic achievements received particular attention across the society. In the Estonian SSR, the Stalin Prize was awarded 55 times (to 42 people in total) between 1946 and 1952. In the field of literature, August Jakobson (1947, 1948), Hans Leberecht (1949) and Juhan Smuul (1952) received the prize.
On 18 January 1949, the leaders of the Baltic Soviet Republics met with Stalin in the Kremlin, where a decision was made to carry out a large-scale deportation. The work notebook of the Estonian SSR party leader Nikolai Karotamm reveals that during the meeting Stalin heaped praise on Hans Leberecht’s recently published novella “Light in Koordi” (Valgus Koordis) (1948): “The man can see! Can write!” For Stalin, what mattered was not the literary value of the work, but its ideological suitability, in this case given the context of the fracture awaiting Estonian villages and the society as a whole – the deportation and mass collectivization. Stalin’s endorsement changed the fate of Leberecht as a Soviet writer – the novella was awarded the Stalin Prize and overnight he became one of the Estonian SSR’s most prestigious regime-friendly authors.
The article analyzes the working principles of the Stalin Prize at the Soviet Republic level on the example of Leberecht’s case. It discusses, among other things, the backstory, the institutional framework (the procedure for candidate submission, the role of the creative union and the party apparatus, and the later penitence), the intrigues arising in the literary circle from the nomination of candidates, the discovery of Leberecht’s literary “talent”, as well as the hurried translation and publication of his novella.
Other Finnic languages and dialects
Succession in tariqa (in the example of the work "nazmu’s-silsile")
BADIA MUHITDINOVA
The work named "Nazmu's-silsile" by the scientist Seyidahmed Vesli Semerkandî, who
lived in the second half of the 19th century and the beginning of the 20th century, covers
philosophical-theological, spiritual-educational and moral-educational issues in terms of
content. Therefore, his work has great importance for our time. Although historical works on the
life and activities of Sufi sheikhs were researched and analyzed in many works on Sufism in the
following years, the development of Sufism, before and after, was not discussed systematically
in which the periods were connected to each other, and a holistic study of Sufism chains was not
carried out. The work "Nazmus-silsila" served as a convenient opportunity for research in this
area.
In this article, detailed information is given about the traditions of inheritance in the
sect, in the context of the "Nazmu's-silsile" work, as well as the ties of the pir-murids (teachersapprenticeship) of highly moral and high-ranking, perfect people who were enlightened by the
deep knowledge of the entire Turanian land, especially Transoxiana. Philosophical-divine,
spiritual-educational, moral-educational issues in the work "Nazmu's-silsile" were also
analyzed. In addition to describing the lives of the representatives of the Naqshbandi order,
historical information about them, their share in Sufism and the branches of the sect were
explained.
Language and Literature, Ural-Altaic languages
The /-sa/ suffixed connection prepositions in contemporary Turkic dialects
SEHER ERENBAŞ PEHLİVAN
The suffix /-sA/, which is considered as the conjunctive mood in Turkey Turkish, has
many other functions; but basically, this suffix still continues its old function of gerundium.
Despite the fact that the /-sA/ suffix is conjugated by taking the personal ending in the
conditional usages (except for subjunctive mood) and is not give judgement completely in the
sentence like other indicative mood and subjunctive mood. Some researchers think that this
suffix still functions as a gerundium and therefore it should be evaluated in the gerundium
category. The suffix /-sA/ in the structure of the connecting prepositions in Turkish is also in the
gerundium function and forms a subordinate clause based on the gerundium group. For this
reason, it can be said that this suffix has become stereotyped over time and forms the connecting
prepositions. There are very few studies that refer to the prepositional function or the connection
function of the /-sA/ suffix, which has been studied many times before. Therefore this study is
based on the determination of the conjunction prepositions formed by the /-sA/ suffix in
contemporary Turkic dialects (Southwest/Southeast, Northwest/Northeast) especially in Turkey
Turkish. It includes classification and evaluation based on the meaning or function it expresses
in the sentence. Connecting prepositions, all of which were found to be subordinate clauses,
were examined under 21 headings. Examples of each preposition were given. It has been
revealed that some of the evaluated conjunctions with /-sA/ consists of 1st person singular, 2nd
person singular and 2nd plural person conjugations. Most of them do not take e the suffix and
person suffixes. And all these prepositions have main auxiliary verbs (-i, ol-/bol-).
Language and Literature, Ural-Altaic languages
Compiling Universal Probabilistic Programming Languages with Efficient Parallel Sequential Monte Carlo Inference
Daniel Lundén, Joey Öhman, Jan Kudlicka
et al.
Probabilistic programming languages (PPLs) allow users to encode arbitrary inference problems, and PPL implementations provide general-purpose automatic inference for these problems. However, constructing inference implementations that are efficient enough is challenging for many real-world problems. Often, this is due to PPLs not fully exploiting available parallelization and optimization opportunities. For example, handling probabilistic checkpoints in PPLs through continuation-passing style transformations or non-preemptive multitasking -- as is done in many popular PPLs -- often disallows compilation to low-level languages required for high-performance platforms such as GPUs. To solve the checkpoint problem, we introduce the concept of PPL control-flow graphs (PCFGs) -- a simple and efficient approach to checkpoints in low-level languages. We use this approach to implement RootPPL: a low-level PPL built on CUDA and C++ with OpenMP, providing highly efficient and massively parallel SMC inference. We also introduce a general method of compiling universal high-level PPLs to PCFGs and illustrate its application when compiling Miking CorePPL -- a high-level universal PPL -- to RootPPL. The approach is the first to compile a universal PPL to GPUs with SMC inference. We evaluate RootPPL and the CorePPL compiler through a set of real-world experiments in the domains of phylogenetics and epidemiology, demonstrating up to 6x speedups over state-of-the-art PPLs implementing SMC inference.