Turn Complexity of Context-free Languages, Pushdown Automata and One-Counter Automata
Giovanni Pighizzini
A turn in a computation of a pushdown automaton is a switch from a phase in which the height of the pushdown store increases to a phase in which it decreases. Given a pushdown or one-counter automaton, we consider, for each string in its language, the minimum number of turns made in accepting computations. We prove that it cannot be decided if this number is bounded by any constants. Furthermore, we obtain a non-recursive trade-off between pushdown and one-counter automata accepting in a finite number of turns and finite-turn pushdown automata, that are defined requiring that the constant bound is satisfied by each accepting computation. We prove that there are languages accepted in a sublinear but not constant number of turns, with respect to the input length. Furthermore, there exists an infinite proper hierarchy of complexity classes, with the number of turns bounded by different sublinear functions. In addition, there is a language requiring a number of turns which is not constant but grows slower than each of the functions defining the above hierarchy.
To have, or maybe not: on the distribution and interpretation of ditransitives in Icelandic and Faroese
C. Ussery, G. R. Harðarson, Annika Simonsen
Kongre Raporu: “Die hispanische Welt neu denken? Perspektiven deutschsprachiger Schriftstellerinnen”, 15-17 Ocak 2025, Valensiya, İspanya
Serra Yılmaz As, Şebnem Sunar
15-17 Ocak 2025 tarihlerinde, İspanya’nın önemli kültürel merkezlerinden Valensiya’da, Almanca yazan kadın yazarların Hispanik dünyaya bakışlarını edebiyat, kültür, kimlik ve tarih temaları üzerinden tartışmayı amaçlayan uluslararası bir kongre gerçekleştirildi. Kongre, Valensiya Üniversitesi Filoloji, Çeviri ve İletişim Fakültesi’ne bağlı Alman Filolojisi Bölümü’nün yürüttüğü CIAICO 2022/105 araştırma projesi çerçevesinde, “Die hispanische Welt neu denken? Perspektiven deutschsprachiger Schriftstellerinnen” başlığıyla düzenlendi. Organizasyonu RIALE Araştırma Grubu üstlendi. Etkinlikte Almanca yazan kadın yazarların eserlerinde Hispanik dünyanın nasıl temsil edildiği ele alındı. Özellikle odepórica, yani seyahat anlatıları temelinde yürütülen tartışmalar; stereotiplerin inşası, erkek bakışının etkisi ve kadın gezginlerin Avrupa kültürel imgelemindeki yeri gibi konulara odaklandı; kültür aktarımı, estetik deneyimler, göç hikâyeleri ve seyahat yazınında kadınların temsili gibi konular etrafında derinlik kazandı. Bu doğrultuda kadın bakış açısının tarihsel ve kültürel bağlamlardaki yansımalarına odaklanıldı. Nitekim kongre, Hispanik ve Alman kültür dünyaları arasındaki ilişkileri başta yenilikçi bir toplumsal cinsiyet perspektifi olmak üzere farklı tarihsel, kültürel ve metodolojik bağlamlar ışığında yeniden yorumlamayı amaçlıyordu. Disiplinlerarası bir çerçevede ilerleyen kongre Alman ve İspanyol edebiyatı çalışmaları, tarih ve toplumsal cinsiyet araştırmaları gibi farklı alanlardan akademisyenleri bir araya getirerek entelektüel etkileşim için zengin bir ortam sundu. Böylece kongre, Alman edebiyatı ile Hispanik dünya arasındaki çok katmanlı ilişkileri tarihsel, kültürel ve edebî bağlamlarda değerlendiren verimli bir akademik platform oluşturdu.
German literature, Germanic languages. Scandinavian languages
CrossTL: A Universal Programming Language Translator with Unified Intermediate Representation
Nripesh Niketan, Vaatsalya Shrivastva
We present CrossTL, a universal programming language translator enabling bidirectional translation between multiple languages through a unified intermediate representation called CrossGL. Traditional approaches require separate translators for each language pair, leading to exponential complexity growth. CrossTL uses a single universal IR to facilitate translations between CUDA, HIP, Metal, DirectX HLSL, OpenGL GLSL, Vulkan SPIR-V, Rust, and Mojo, with Slang support in development. Our system consists of: language-specific lexers/parsers converting source code to ASTs, bidirectional CrossGL translation modules implementing ToCrossGLConverter classes for importing code and CodeGen classes for target generation, and comprehensive backend implementations handling full translation pipelines. We demonstrate effectiveness through comprehensive evaluation across programming domains, achieving successful compilation and execution across all supported backends. The universal IR design enables adding new languages with minimal effort, requiring only language-specific frontend/backend components. Our contributions include: (1) a unified IR capturing semantics of multiple programming paradigms, (2) a modular architecture enabling extensibility, (3) a comprehensive framework supporting GPU compute, graphics programming, and systems languages, and (4) empirical validation demonstrating practical viability of universal code translation. CrossTL represents a significant step toward language-agnostic programming, enabling write-once, deploy-everywhere development.
Graph Rewriting Language as a Platform for Quantum Diagrammatic Calculi
Kayo Tei, Haruto Mishina, Naoki Yamamoto
et al.
Systematic discovery of optimization paths in quantum circuit simplification remains a challenge. Today, ZX-calculus, a computing model for quantum circuit transformation, is attracting attention for its highly abstract graph-based approach. Whereas existing tools such as PyZX and Quantomatic offer domain-specific support for quantum circuit optimization, visualization and theorem-proving, we present a complementary approach using LMNtal, a general-purpose hierarchical graph rewriting language, to establish a diagrammatic transformation and verification platform with model checking. Our methodology shows three advantages: (1) manipulation of ZX-diagrams through native graph transformation rules, enabling direct implementation of basic rules; (2) quantified pattern matching via QLMNtal extensions, greatly simplifying rule specification; and (3) interactive visualization and validation of optimization paths through state space exploration. Through case studies, we demonstrate how our framework helps understand optimization paths and design new algorithms and strategies. This suggests that the declarative language LMNtal and its toolchain could serve as a new platform to investigate quantum circuit transformation from a different perspective.
From Roots to Borrowings: The Evolution of the English Lexicon
Alaviyya Nuri
The English lexicon is a dynamic and evolving entity shaped by centuries of internal development and external linguistic influences. This study explores the historical roots and borrowings that have contributed to its rich and diverse vocabulary. Through a historical linguistic approach, comparative analysis, and corpus-based studies, the research examines the interplay between native Germanic elements and borrowed terms from Latin, French, Scandinavian, and other languages. Findings reveal that borrowing has played a pivotal role in filling lexical gaps, enriching the lexicon, and reflecting sociocultural transformations such as Christianization, the Norman Conquest, and globalization. Borrowed terms, from Latin religious vocabulary to contemporary technology-related words, demonstrate the adaptability and inclusivity of English. The study also addresses challenges in categorizing borrowings, particularly the distinction between fully integrated terms and recent loanwords. The results highlight the lexicon as a testament to cultural and linguistic exchange, with implications for understanding English as a global lingua franca. Future research should focus on underexplored influences, such as Indigenous contributions and the role of digital communication in accelerating modern borrowing trends.
A presença do alemão na paisagem linguística de uma cidade do interior de SP: discussão, aplicação e possibilidades pedagógicas
Nádia Cristina Dini
A Paisagem Linguística, área de pesquisa relativamente recente, trata da forma escrita das línguas visível no espaço público. A metodologia de coleta de dados engloba fotografar e documentar os diferentes tipos de suporte, de línguas e suas funções no espaço urbano. As pesquisas em Paisagem Linguística geralmente consideram a presença simultânea de várias línguas ou variedades e sua relação umas com as outras, porém, dentro da abordagem denominada Spot German, não necessariamente utilizada com fins acadêmicos, a coleta de dados pode ser realizada com um objetivo específico, sem documentação de todos os sinais visíveis. Assim, neste artigo, discorro sobre as características, definições e diferenças entre as abordagens e apresento brevemente as possíveis fontes de coleta de dados com base em pesquisas realizadas em diversos países. Além disso, trago como exemplo de aplicação a língua alemã em sua relação com o espaço próximo a um colégio alemão e sua possível influência na paisagem linguística da cidade na qual se encontra. Por fim, apresento possibilidades pedagógicas para explorar a paisagem linguística no ensino de línguas, que oferece oportunidade de investigar a relação da cidade e das pessoas com as línguas do entorno, da mesma forma que apresenta inúmeras entradas para exploração didática.
German literature, Germanic languages. Scandinavian languages
Polymorphic Records for Dynamic Languages
Giuseppe Castagna, Loïc Peyrot
We define and study "row polymorphism" for a type system with set-theoretic types, specifically union, intersection, and negation types. We consider record types that embed row variables and define a subtyping relation by interpreting types into sets of record values and by defining subtyping as the containment of interpretations. We define a functional calculus equipped with operations for field extension, selection, and deletion, its operational semantics, and a type system that we prove to be sound. We provide algorithms for deciding the typing and subtyping relations. This research is motivated by the current trend of defining static type system for dynamic languages and, in our case, by an ongoing effort of endowing the Elixir programming language with a gradual type system.
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana
et al.
Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural contexts, respect local sensitivities, and support low-resource languages, all while effectively integrating corresponding visual cues. In pursuit of culturally diverse global multimodal models, our proposed All Languages Matter Benchmark (ALM-bench) represents the largest and most comprehensive effort to date for evaluating LMMs across 100 languages. ALM-bench challenges existing models by testing their ability to understand and reason about culturally diverse images paired with text in various languages, including many low-resource languages traditionally underrepresented in LMM research. The benchmark offers a robust and nuanced evaluation framework featuring various question formats, including true/false, multiple choice, and open-ended questions, which are further divided into short and long-answer categories. ALM-bench design ensures a comprehensive assessment of a model's ability to handle varied levels of difficulty in visual and linguistic reasoning. To capture the rich tapestry of global cultures, ALM-bench carefully curates content from 13 distinct cultural aspects, ranging from traditions and rituals to famous personalities and celebrations. Through this, ALM-bench not only provides a rigorous testing ground for state-of-the-art open and closed-source LMMs but also highlights the importance of cultural and linguistic inclusivity, encouraging the development of models that can serve diverse global populations effectively. Our benchmark is publicly available.
Overview of the First Workshop on Language Models for Low-Resource Languages (LoResLM 2025)
Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson
et al.
The first Workshop on Language Models for Low-Resource Languages (LoResLM 2025) was held in conjunction with the 31st International Conference on Computational Linguistics (COLING 2025) in Abu Dhabi, United Arab Emirates. This workshop mainly aimed to provide a forum for researchers to share and discuss their ongoing work on language models (LMs) focusing on low-resource languages, following the recent advancements in neural language models and their linguistic biases towards high-resource languages. LoResLM 2025 attracted notable interest from the natural language processing (NLP) community, resulting in 35 accepted papers from 52 submissions. These contributions cover a broad range of low-resource languages from eight language families and 13 diverse research areas, paving the way for future possibilities and promoting linguistic inclusivity in NLP.
Statically Contextualizing Large Language Models with Typed Holes
Andrew Blinn, Xiang Li, June Hyung Kim
et al.
Large language models (LLMs) have reshaped the landscape of program synthesis. However, contemporary LLM-based code completion systems often hallucinate broken code because they lack appropriate context, particularly when working with definitions not in the training data nor near the cursor. This paper demonstrates that tight integration with the type and binding structure of a language, as exposed by its language server, can address this contextualization problem in a token-efficient manner. In short, we contend that AIs need IDEs, too! In particular, we integrate LLM code generation into the Hazel live program sketching environment. The Hazel Language Server identifies the type and typing context of the hole being filled, even in the presence of errors, ensuring that a meaningful program sketch is always available. This allows prompting with codebase-wide contextual information not lexically local to the cursor, nor necessarily in the same file, but that is likely to be semantically local to the developer's goal. Completions synthesized by the LLM are then iteratively refined via further dialog with the language server. To evaluate these techniques, we introduce MVUBench, a dataset of model-view-update (MVU) web applications. These applications serve as challenge problems due to their reliance on application-specific data structures. We find that contextualization with type definitions is particularly impactful. After introducing our ideas in the context of Hazel we duplicate our techniques and port MVUBench to TypeScript in order to validate the applicability of these methods to higher-resource languages. Finally, we outline ChatLSP, a conservative extension to the Language Server Protocol (LSP) that language servers can implement to expose capabilities that AI code completion systems of various designs can use to incorporate static context when generating prompts for an LLM.
SWEb: A Large Web Dataset for the Scandinavian Languages
Tobias Norlund, Tim Isbister, Amaru Cuba Gyllensten
et al.
This paper presents the hitherto largest pretraining dataset for the Scandinavian languages: the Scandinavian WEb (SWEb), comprising over one trillion tokens. The paper details the collection and processing pipeline, and introduces a novel model-based text extractor that significantly reduces complexity in comparison with rule-based approaches. We also introduce a new cloze-style benchmark for evaluating language models in Swedish, and use this test to compare models trained on the SWEb data to models trained on FineWeb, with competitive results. All data, models and code are shared openly.
Proceedings of the 18th International Workshop on Logical Frameworks and Meta-Languages: Theory and Practice
Alberto Ciaffaglione, Carlos Olarte
Logical frameworks and meta-languages form a common substrate for representing, implementing and reasoning about a wide variety of deductive systems of interest in logic and computer science. Their design, implementation and their use in reasoning tasks, ranging from the correctness of software to the properties of formal systems, have been the focus of considerable research over the last two decades. This workshop brings together designers, implementors and practitioners to discuss various aspects impinging on the structure and utility of logical frameworks, including the treatment of variable binding, inductive and co-inductive reasoning techniques and the expressiveness and lucidity of the reasoning process.
Polymorphic Type Inference for Dynamic Languages
Giuseppe Castagna, Mickaël Laurent, Kim Nguyen
We present a type system that combines, in a controlled way, first-order polymorphism with intersectiontypes, union types, and subtyping, and prove its safety. We then define a type reconstruction algorithm that issound and terminating. This yields a system in which unannotated functions are given polymorphic types(thanks to Hindley-Milner) that can express the overloaded behavior of the functions they type (thanks tothe intersection introduction rule) and that are deduced by applying advanced techniques of type narrowing(thanks to the union elimination rule). This makes the system a prime candidate to type dynamic languages.
Suspension Analysis and Selective Continuation-Passing Style for Universal Probabilistic Programming Languages
Daniel Lundén, Lars Hummelgren, Jan Kudlicka
et al.
Universal probabilistic programming languages (PPLs) make it relatively easy to encode and automatically solve statistical inference problems. To solve inference problems, PPL implementations often apply Monte Carlo inference algorithms that rely on execution suspension. State-of-the-art solutions enable execution suspension either through (i) continuation-passing style (CPS) transformations or (ii) efficient, but comparatively complex, low-level solutions that are often not available in high-level languages. CPS transformations introduce overhead due to unnecessary closure allocations -- a problem the PPL community has generally overlooked. To reduce overhead, we develop a new efficient selective CPS approach for PPLs. Specifically, we design a novel static suspension analysis technique that determines parts of programs that require suspension, given a particular inference algorithm. The analysis allows selectively CPS transforming the program only where necessary. We formally prove the correctness of the analysis and implement the analysis and transformation in the Miking CorePPL compiler. We evaluate the implementation for a large number of Monte Carlo inference algorithms on real-world models from phylogenetics, epidemiology, and topic modeling. The evaluation results demonstrate significant improvements across all models and inference algorithms.
Kann das doch weg? Dealing with the German Modal Particle doch in Albanian Literary Texts’ Translation
Blertë İsmajli, Vjosa Hamiti
In linguistics, the translatability of the German modal particles is considered a popular study object, particularly because of their rendering in “particle-poor” languages. Thus, the present study examines the translatability of the modal particle doch in Albanian literary texts. Compared to German language, Albanian does not have a group of modal particles and can, therefore, be classified as a “particle-poor” language. In general, (inter)subjective language elements in Albanian, such as modal particles in this case, come to a full meaning only in the context of use. Therefore, the translation must take the context into account. When translating from German to a particle-poor language, other means of expression, such as morphological, prosodic patterns, or a combination of these elements, available in the language are chosen to express the particular modal nuance. These expressions can also include contextual-pragmatic means and the use of extra-linguistic factors such as intonation, voice pitch, sentence accent, gestures, and facial expressions. This empirical corpus analysis is based on two novels: Franz Kafka’s “The Trial” (Der Prozess) and Hermann Hesse’s “Siddhartha - A Poem of India” (Siddhartha – eine indische Dichtung), and their two sets of translations into Albanian. Examining two translations of the same literary text allows for a better understanding of the lexical and syntactical means used in Albanian to render the modal particle doch. During the corpus analysis, the translation equivalents of doch in Albanian were extracted and classified into omission, transposition, paraphrasing, and word-for-word translations, based on the common translation-theoretical definitions.
German literature, Germanic languages. Scandinavian languages
Measuring Harmful Representations in Scandinavian Language Models
Samia Touileb, Debora Nozza
Scandinavian countries are perceived as role-models when it comes to gender equality. With the advent of pre-trained language models and their widespread usage, we investigate to what extent gender-based harmful and toxic content exist in selected Scandinavian language models. We examine nine models, covering Danish, Swedish, and Norwegian, by manually creating template-based sentences and probing the models for completion. We evaluate the completions using two methods for measuring harmful and toxic completions and provide a thorough analysis of the results. We show that Scandinavian pre-trained language models contain harmful and gender-based stereotypes with similar values across all languages. This finding goes against the general expectations related to gender equality in Scandinavian countries and shows the possible problematic outcomes of using such models in real-world settings.
Katharina Jacob, Klaus-Peter Konerding & Wolf-Andreas Liebert (Hg.). 2020. Sprache und Empathie. Beiträge zur Grundlegung eines linguistischen Forschungsprogramms (Sprache und Wissen 42). Berlin, Boston: De Gruyter. 631 S.
Dreesen Philipp
Germanic languages. Scandinavian languages
Compiling Universal Probabilistic Programming Languages with Efficient Parallel Sequential Monte Carlo Inference
Daniel Lundén, Joey Öhman, Jan Kudlicka
et al.
Probabilistic programming languages (PPLs) allow users to encode arbitrary inference problems, and PPL implementations provide general-purpose automatic inference for these problems. However, constructing inference implementations that are efficient enough is challenging for many real-world problems. Often, this is due to PPLs not fully exploiting available parallelization and optimization opportunities. For example, handling probabilistic checkpoints in PPLs through continuation-passing style transformations or non-preemptive multitasking -- as is done in many popular PPLs -- often disallows compilation to low-level languages required for high-performance platforms such as GPUs. To solve the checkpoint problem, we introduce the concept of PPL control-flow graphs (PCFGs) -- a simple and efficient approach to checkpoints in low-level languages. We use this approach to implement RootPPL: a low-level PPL built on CUDA and C++ with OpenMP, providing highly efficient and massively parallel SMC inference. We also introduce a general method of compiling universal high-level PPLs to PCFGs and illustrate its application when compiling Miking CorePPL -- a high-level universal PPL -- to RootPPL. The approach is the first to compile a universal PPL to GPUs with SMC inference. We evaluate RootPPL and the CorePPL compiler through a set of real-world experiments in the domains of phylogenetics and epidemiology, demonstrating up to 6x speedups over state-of-the-art PPLs implementing SMC inference.
Evaluating Transferability of BERT Models on Uralic Languages
Judit Ács, Dániel Lévai, András Kornai
Transformer-based language models such as BERT have outperformed previous models on a large number of English benchmarks, but their evaluation is often limited to English or a small number of well-resourced languages. In this work, we evaluate monolingual, multilingual, and randomly initialized language models from the BERT family on a variety of Uralic languages including Estonian, Finnish, Hungarian, Erzya, Moksha, Karelian, Livvi, Komi Permyak, Komi Zyrian, Northern Sámi, and Skolt Sámi. When monolingual models are available (currently only et, fi, hu), these perform better on their native language, but in general they transfer worse than multilingual models or models of genetically unrelated languages that share the same character set. Remarkably, straightforward transfer of high-resource models, even without special efforts toward hyperparameter optimization, yields what appear to be state of the art POS and NER tools for the minority Uralic languages where there is sufficient data for finetuning.