Hasil "Germanic languages. Scandinavian languages"

arXiv Open Access 2026

Sequential densities of rational languages

Alexi Block Gorman, Dominique Perrin

We introduce the notion of density of a rational language with respect to a sequence of probability measures. We prove that if $(μ_n)$ is a sequence of Bernoulli measures converging to a positive Bernoulli measure $\overlineμ$, the sequential density is the ordinary density with respect to $\overlineμ$. We also prove that if $(μ_n)$ is a sequence of invariant probability measures converging in the strong sense to an invariant probability measure $\overlineμ$, then the sequential density of every rational language exists for this sequence.

en math.DS, cs.FL

Detail Sumber

arXiv Open Access 2026

Out-of-Order Membership to Regular Languages

Antoine Amarilli, Sebastien Labbe, Charles Paperman

We introduce the task of out-of-order membership to a formal language L, where the letters of a word w are revealed one by one in an adversarial order. The length |w| is known in advance, but the content of w is streamed as pairs (i, w[i]), received exactly once for each position i, in arbitrary order. We study efficient algorithms for this task when L is regular, seeking tight complexity bounds as a function of |w| for a fixed target language. Most of our results apply to an algebraically defined variant dubbed out-of-order evaluation: this problem is defined for a fixed finite monoid or semigroup S, and our goal is to compute the ordered product of the streamed elements of w. We show that, for any fixed regular language or finite semigroup, both problems can be solved in constant time per streamed symbol and in linear space. However, the precise space complexity strongly depends on the algebraic structure of the target language or evaluation semigroup. Our main contributions are therefore to show (deterministic) space complexity characterizations, which we do for out-of-order evaluation of monoids and semigroups. For monoids, we establish a trichotomy: the space complexity is either Θ(1), Θ(log n), or Θ(n), where n = |w|. More specifically, the problem admits a constant-space solution for commutative monoids, while all non-commutative monoids require Ω(log n) space. We further identify a class of monoids admitting an O(log n)-space algorithm, and show that all remaining monoids require Ω(n) space. For general semigroups, the situation is more intricate. We characterize a class of semigroups admitting constant-space algorithms for out-of-order evaluation, and show that semigroups outside this class require at least Ω(log n) space.

en cs.FL, cs.DS

Detail Sumber

arXiv Open Access 2025

A Type System for Data Privacy Compliance in Active Object Languages

Chinmayi Prabhu Baramashetru, Paola Giannini, Silvia Lizeth Tapia Tarifa et al.

Data protection laws such as GDPR aim to give users unprecedented control over their personal data. Compliance with these regulations requires systematically considering information flow and interactions among entities handling sensitive data. Privacy-by-design principles advocate embedding data protection into system architectures as a default. However, translating these abstract principles into concrete, explicit methods remains a significant challenge. This paper addresses this gap by proposing a language-based approach to privacy integration, combining static and runtime techniques. By employing type checking and type inference in an active object language, the framework enables the tracking of authorised data flows and the automatic generation of constraints checked at runtime based on user consent. This ensures that personal data is processed in compliance with GDPR constraints. The key contribution of this work is a type system that gather the compliance checks and the changes to users consent and integrates data privacy compliance verification into system execution. The paper demonstrates the feasibility of this approach through a soundness proof and several examples, illustrating how the proposed language addresses common GDPR requirements, such as user consent, purpose limitation, and data subject rights. This work advances the state of the art in privacy-aware system design by offering a systematic and automated method for integrating GDPR compliance into programming languages. This capability has implications for building trustworthy systems in domains such as healthcare or finance, where data privacy is crucial.

en cs.PL

Detail DOI Sumber

arXiv Open Access 2025

PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference

Yufeng Gu, Alireza Khadem, Sumanth Umesh et al.

Large Language Model (LLM) inference uses an autoregressive manner to generate one token at a time, which exhibits notably lower operational intensity compared to earlier Machine Learning (ML) models such as encoder-only transformers and Convolutional Neural Networks. At the same time, LLMs possess large parameter sizes and use key-value caches to store context information. Modern LLMs support context windows with up to 1 million tokens to generate versatile text, audio, and video content. A large key-value cache unique to each prompt requires a large memory capacity, limiting the inference batch size. Both low operational intensity and limited batch size necessitate a high memory bandwidth. However, contemporary hardware systems for ML model deployment, such as GPUs and TPUs, are primarily optimized for compute throughput. This mismatch challenges the efficient deployment of advanced LLMs and makes users pay for expensive compute resources that are poorly utilized for the memory-bound LLM inference tasks. We propose CENT, a CXL-ENabled GPU-Free sysTem for LLM inference, which harnesses CXL memory expansion capabilities to accommodate substantial LLM sizes, and utilizes near-bank processing units to deliver high memory bandwidth, eliminating the need for expensive GPUs. CENT exploits a scalable CXL network to support peer-to-peer and collective communication primitives across CXL devices. We implement various parallelism strategies to distribute LLMs across these devices. Compared to GPU baselines with maximum supported batch sizes and similar average power, CENT achieves 2.3$\times$ higher throughput and consumes 2.9$\times$ less energy. CENT enhances the Total Cost of Ownership (TCO), generating 5.2$\times$ more tokens per dollar than GPUs.

en cs.AR

Detail DOI Sumber

arXiv Open Access 2025

The Formal Semantics and Implementation of a Domain-Specific Language for Mixed-Initiative Dialogs

Zachary S. Rowland, Saverio Perugini

Human-computer dialog plays a prominent role in interactions conducted at kiosks (e.g., withdrawing money from an atm or filling your car with gas), on smartphones (e.g., installing and configuring apps), and on the web (e.g., booking a flight). Some human-computer dialogs involve an exchange of system-initiated and user-initiated actions. These dialogs are called *mixed-initiative dialogs* and sometimes also involve the pursuit of multiple interleaved sub-dialogs, which are woven together in a manner akin to coroutines. However, existing dialog-authoring languages have difficulty expressing these dialogs concisely. In this work, we improve the expressiveness of a dialog-authoring language we call *dialog specification language* (dsl), which is based on the programming concepts of functional application, partial function application, currying, and partial evaluation, by augmenting it with additional abstractions to support concise specification of task-based, mixed-initiative dialogs that resemble concurrently executing coroutines. We also formalize the semantics of dsl -- the process of simplifying and staging such dialogs specified in the language. We demonstrate that dialog specifications are compressed by to a higher degree when written in dsl using the new abstractions. We also operationalize the formal semantics of dsl in a Haskell functional programming implementation. The Haskell implementation of the simplification/staging rules provides a proof of concept that the formal semantics are sufficient to implement a dialog system specified with the language. We evaluate dsl from practical (i.e., case study), conceptual (i.e., comparisons to similar systems such as VoiceXML and State Chart XML), and theoretical perspectives. The practical applicability of the new language abstractions introduced in this work is demonstrated in a case study by using it to model portions of an online food ordering system that can be concurrently staged. Our results indicate that dsl enables concise representation of dialogs composed of multiple concurrent sub-dialogs and improves the compression of dialog expressions reported in prior research. We anticipate that the extension of our language and the formalization of the semantics can facilitate concise specification and smooth implementation of task-based, mixed-initiative, human-computer dialog systems across various domains such as atms and interactive, voice-response systems.

en cs.PL

Detail DOI Sumber

arXiv Open Access 2024

A Bionic Natural Language Parser Equivalent to a Pushdown Automaton

Zhenghao Wei, Kehua Lin, Jianlin Feng

Assembly Calculus (AC), proposed by Papadimitriou et al., aims to reproduce advanced cognitive functions through simulating neural activities, with several applications based on AC having been developed, including a natural language parser proposed by Mitropolsky et al. However, this parser lacks the ability to handle Kleene closures, preventing it from parsing all regular languages and rendering it weaker than Finite Automata (FA). In this paper, we propose a new bionic natural language parser (BNLP) based on AC and integrates two new biologically rational structures, Recurrent Circuit and Stack Circuit which are inspired by RNN and short-term memory mechanism. In contrast to the original parser, the BNLP can fully handle all regular languages and Dyck languages. Therefore, leveraging the Chomsky-Sch űtzenberger theorem, the BNLP which can parse all Context-Free Languages can be constructed. We also formally prove that for any PDA, a Parser Automaton corresponding to BNLP can always be formed, ensuring that BNLP has a description ability equal to that of PDA and addressing the deficiencies of the original parser.

en cs.CL, cs.FL

Detail Sumber

arXiv Open Access 2024

Exo 2: Growing a Scheduling Language

Yuka Ikarashi, Kevin Qian, Samir Droubi et al.

User-schedulable languages (USLs) help programmers productively optimize programs by providing safe means of transforming them. Current USLs are designed to give programmers exactly the control they want, while automating all other concerns. However, there is no universal answer for what performance-conscious programmers want to control, how they want to control it, and what they want to automate, even in relatively narrow domains. We claim that USLs should, instead, be designed to grow. We present Exo 2, a scheduling language that enables users to define new scheduling operations externally to the compiler. By composing a set of trusted, fine-grained primitives, users can safely write their own scheduling library to build up desired automation. We identify actions (ways of modifying code), inspection (ways of interrogating code), and references (ways of pointing to code) as essential for any user-extensible USL. We fuse these ideas into a new mechanism called Cursors that enables the creation of scheduling libraries in user code. We demonstrate libraries that amortize scheduling effort across more than 80 high-performance kernels, reducing total scheduling code by an order of magnitude and delivering performance competitive with state-of-the-art implementations on three different platforms.

en cs.PL

Detail Sumber

arXiv Open Access 2024

Topoi of automata I: Four topoi of automata and regular languages

Ryuya Hora

Both topos theory and automata theory are known for their multi-faceted nature and relationship with topology, algebra, logic, and category theory. This paper aims to clarify the topos-theoretic aspects of automata theory, particularly demonstrating through two main theorems how regular (and non-regular) languages arise in topos-theoretic calculation. First, it is shown that the four different notions of automata form four types of Grothendieck topoi, illustrating how the technical details of automata theory are described by topos theory. Second, we observe that the four characterizations of regular languages (DFA, Myhill-Nerode theorem, finite monoids, profinite words) provide Morita-equivalent definitions of a single Boolean-ringed topos, situating this within the context of Olivia Caramello's 'Toposes as Bridges.' This paper also serves as a preparation for follow-up papers, which deal with the relationship between hyperconnected geometric morphisms and algebraic/geometric aspects of formal language theory.

en cs.FL, math.CT

Detail Sumber

arXiv Open Access 2023

A feasible and unitary quantum programming language

Alejandro Díaz-Caro, Emmanuel Hainry, Romain Péchoux et al.

We introduce a novel quantum programming language featuring higher-order programs and quantum controlflow which ensures that all qubit transformations are unitary. Our language boasts a type system guaranteeingboth unitarity and polynomial-time normalization. Unitarity is achieved by using a special modality forsuperpositions while requiring orthogonality among superposed terms. Polynomial-time normalization isachieved using a linear-logic-based type discipline employing Barber and Plotkin duality along with a specificmodality to account for potential duplications. This type discipline also guarantees that derived values havepolynomial size. Our language seamlessly combines the two modalities: quantum circuit programs upholdunitarity, and all programs are evaluated in polynomial time, ensuring their feasibility.

en cs.LO, cs.PL

Detail Sumber

arXiv Open Access 2022

Lupa: A Framework for Large Scale Analysis of the Programming Language Usage

Anna Vlasova, Maria Tigina, Ilya Vlasov et al.

In this paper, we present Lupa - a framework for large-scale analysis of the programming language usage. Lupa is a command line tool that uses the power of the IntelliJ Platform under the hood, which gives it access to powerful static analysis tools used in modern IDEs. The tool supports custom analyzers that process the rich concrete syntax tree of the code and can calculate its various features: the presence of entities, their dependencies, definition-usage chains, etc. Currently, Lupa supports analyzing Python and Kotlin, but can be extended to other languages supported by IntelliJ-based IDEs. We explain the internals of the tool, show how it can be extended and customized, and describe an example analysis that we carried out with its help: analyzing the syntax of ranges in Kotlin.

en cs.PL, cs.SE

Detail Sumber

arXiv Open Access 2022

Automatic Differentiation for ML-family languages: correctness via logical relations

Fernando Lucatelli Nunes, Matthijs Vákár

We give a simple, direct and reusable logical relations technique for languages with term and type recursion and partially defined differentiable functions. We demonstrate it by working out the case of Automatic Differentiation (AD) correctness: namely, we present a correctness proof of a dual numbers style AD code transformation for realistic functional languages in the ML-family. We also show how this code transformation provides us with correct forward- and reverse-mode AD. The starting point is to interpret a functional programming language as a suitable freely generated categorical structure. In this setting, by the universal property of the syntactic categorical structure, the dual numbers AD code transformation and the basic $ω$-cpo semantics arise as structure preserving functors. The proof follows, then, by a novel logical relations argument. The key to much of our contribution is a powerful monadic logical relations technique for term recursion and recursive types. It provides us with a semantic correctness proof based on a simple approach for denotational semantics, making use only of the very basic concrete model of $ω$-cpos.

en cs.PL

Detail DOI Sumber

arXiv Open Access 2022

When Language Model Meets Private Library

Daoguang Zan, Bei Chen, Zeqi Lin et al.

With the rapid development of pre-training techniques, a number of language models have been pre-trained on large-scale code corpora and perform well in code generation. In this paper, we investigate how to equip pre-trained language models with the ability of code generation for private libraries. In practice, it is common for programmers to write code using private libraries. However, this is a challenge for language models since they have never seen private APIs during training. Motivated by the fact that private libraries usually come with elaborate API documentation, we propose a novel framework with two modules: the APIRetriever finds useful APIs, and then the APICoder generates code using these APIs. For APIRetriever, we present a dense retrieval system and also design a friendly interaction to involve uses. For APICoder, we can directly use off-the-shelf language models, or continually pre-train the base model on a code corpus containing API information. Both modules are trained with data from public libraries and can be generalized to private ones. Furthermore, we craft three benchmarks for private libraries, named TorchDataEval, MonkeyEval, and BeatNumEval. Experimental results demonstrate the impressive performance of our framework.

en cs.PL, cs.CL

Detail Sumber

arXiv Open Access 2021

Gillian: A Multi-Language Platform for Unified Symbolic Analysis

Petar Maksimović, José Fragoso Santos, Sacha-Élie Ayoun et al.

This is an evolving document describing the meta-theory, the implementation, and the instantiations of Gillian, a multi-language symbolic analysis platform.

en cs.PL, cs.LO

Detail Sumber

arXiv Open Access 2020

TF-Coder: Program Synthesis for Tensor Manipulations

Kensen Shi, David Bieber, Rishabh Singh

The success and popularity of deep learning is on the rise, partially due to powerful deep learning frameworks such as TensorFlow and PyTorch that make it easier to develop deep learning models. However, these libraries also come with steep learning curves, since programming in these frameworks is quite different from traditional imperative programming with explicit loops and conditionals. In this work, we present a tool called TF-Coder for programming by example in TensorFlow. TF-Coder uses a bottom-up weighted enumerative search, with value-based pruning of equivalent expressions and flexible type- and value-based filtering to ensure that expressions adhere to various requirements imposed by the TensorFlow library. We train models to predict TensorFlow operations from features of the input and output tensors and natural language descriptions of tasks, to prioritize relevant operations during search. TF-Coder solves 63 of 70 real-world tasks within 5 minutes, sometimes finding simpler solutions in less time compared to experienced human programmers.

en cs.PL, cs.LG

Detail DOI Sumber

arXiv Open Access 2018

Word Problem Languages for Free Inverse Monoids

Tara Brough

This paper considers the word problem for free inverse monoids of finite rank from a language theory perspective. It is shown that no free inverse monoid has context-free word problem; that the word problem of the free inverse monoid of rank $1$ is both $2$-context-free (an intersection of two context-free languages) and ET0L; that the co-word problem of the free inverse monoid of rank $1$ is context-free; and that the word problem of a free inverse monoid of rank greater than $1$ is not poly-context-free.

en math.GR, cs.FL

Detail Sumber

arXiv Open Access 2017

Label Languages of 8-directional Array P System

William Suresh Kumar, Kalpana Mahalingam, Raghavan Rama

An 8-directional array P system is one where the rewriting of an array can happen in any 8-directions. The array rules of such a system are labelled thus resulting in a labelled 8-directional array P system. The labelling is not unique and the label language is obtained by recording the strings over the labels used in any terminating derivation of the P system. The system is shown to generate interesting pictures. The label language is compared with Chomsky hierarchy.

en cs.FL

Detail Sumber

arXiv Open Access 2014

More Structural Characterizations of Some Subregular Language Families by Biautomata

Markus Holzer, Sebastian Jakobi

We study structural restrictions on biautomata such as, e.g., acyclicity, permutation-freeness, strongly permutation-freeness, and orderability, to mention a few. We compare the obtained language families with those induced by deterministic finite automata with the same property. In some cases, it is shown that there is no difference in characterization between deterministic finite automata and biautomata as for the permutation-freeness, but there are also other cases, where it makes a big difference whether one considers deterministic finite automata or biautomata. This is, for instance, the case when comparing strongly permutation-freeness, which results in the family of definite language for deterministic finite automata, while biautomata induce the family of finite and co-finite languages. The obtained results nicely fall into the known landscape on classical language families.

en cs.FL

Detail DOI Sumber

arXiv Open Access 2012

SL: a "quick and dirty" but working intermediate language for SVP systems

Raphael Poss

The CSA group at the University of Amsterdam has developed SVP, a framework to manage and program many-core and hardware multithreaded processors. In this article, we introduce the intermediate language SL, a common vehicle to program SVP platforms. SL is designed as an extension to the standard C language (ISO C99/C11). It includes primitive constructs to bulk create threads, bulk synchronize on termination of threads, and communicate using word-sized dataflow channels between threads. It is intended for use as target language for higher-level parallelizing compilers. SL is a research vehicle; as of this writing, it is the only interface language to program a main SVP platform, the new Microgrid chip architecture. This article provides an overview of the language, to complement a detailed specification available separately.

en cs.PL, cs.DC

Detail Sumber

arXiv Open Access 2010

Rewriting Logic Semantics of a Plan Execution Language

Gilles Dowek, César Muñoz, Camilo Rocha

The Plan Execution Interchange Language (PLEXIL) is a synchronous language developed by NASA to support autonomous spacecraft operations. In this paper, we propose a rewriting logic semantics of PLEXIL in Maude, a high-performance logical engine. The rewriting logic semantics is by itself a formal interpreter of the language and can be used as a semantic benchmark for the implementation of PLEXIL executives. The implementation in Maude has the additional benefit of making available to PLEXIL designers and developers all the formal analysis and verification tools provided by Maude. The formalization of the PLEXIL semantics in rewriting logic poses an interesting challenge due to the synchronous nature of the language and the prioritized rules defining its semantics. To overcome this difficulty, we propose a general procedure for simulating synchronous set relations in rewriting logic that is sound and, for deterministic relations, complete. We also report on two issues at the design level of the original PLEXIL semantics that were identified with the help of the executable specification in Maude.

en cs.PL, cs.LO

Detail DOI Sumber

arXiv Open Access 2009

On Pebble Automata for Data Languages with Decidable Emptiness Problem

Tony Tan

In this paper we study a subclass of pebble automata (PA) for data languages for which the emptiness problem is decidable. Namely, we introduce the so-called top view weak PA. Roughly speaking, top view weak PA are weak PA where the equality test is performed only between the data values seen by the two most recently placed pebbles. The emptiness problem for this model is decidable. We also show that it is robust: alternating, nondeterministic and deterministic top view weak PA have the same recognition power. Moreover, this model is strong enough to accept all data languages expressible in Linear Temporal Logic with the future-time operators, augmented with one register freeze quantifier.

en cs.FL

Detail DOI Sumber

Hasil untuk "Germanic languages. Scandinavian languages"