Hasil untuk "Comparative grammar"

Menampilkan 20 dari ~3702961 hasil · dari arXiv, DOAJ, Semantic Scholar, CrossRef

JSON API
arXiv Open Access 2026
Context-Free Grammar Inference for Complex Programming Languages in Black Box Settings

Feifei Li, Xiao Chen, Xiaoyu Sun et al.

Grammar inference for complex programming languages remains a significant challenge, as existing approaches fail to scale to real world datasets within practical time constraints. In our experiments, none of the state-of-the-art tools, including Arvada, Treevada and Kedavra were able to infer grammars for complex languages such as C, C++, and Java within 48 hours. Arvada and Treevada perform grammar inference directly on full-length input examples, which proves inefficient for large files commonly found in such languages. While Kedavra introduces data decomposition to create shorter examples for grammar inference, its lexical analysis still relies on the original inputs. Additionally, its strict no-overgeneralization constraint limits the construction of complex grammars. To overcome these limitations, we propose Crucio, which builds a decomposition forest to extract short examples for lexical and grammar inference via a distributional matrix. Experimental results show that Crucio is the only method capable of successfully inferring grammars for complex programming languages (where the number of nonterminals is up to 23x greater than in prior benchmarks) within reasonable time limits. On the prior simple benchmark, Crucio achieves an average recall improvement of 1.37x and 1.19x over Treevada and Kedavra, respectively, and improves F1 scores by 1.21x and 1.13x.

en cs.PL
S2 Open Access 2023
A Scenario-Generic Neural Machine Translation Data Augmentation Method

Xiner Liu, Jia He, Mingzhe Liu et al.

Amid the rapid advancement of neural machine translation, the challenge of data sparsity has been a major obstacle. To address this issue, this study proposes a general data augmentation technique for various scenarios. It examines the predicament of parallel corpora diversity and high quality in both rich- and low-resource settings, and integrates the low-frequency word substitution method and reverse translation approach for complementary benefits. Additionally, this method improves the pseudo-parallel corpus generated by the reverse translation method by substituting low-frequency words and includes a grammar error correction module to reduce grammatical errors in low-resource scenarios. The experimental data are partitioned into rich- and low-resource scenarios at a 10:1 ratio. It verifies the necessity of grammatical error correction for pseudo-corpus in low-resource scenarios. Models and methods are chosen from the backbone network and related literature for comparative experiments. The experimental findings demonstrate that the data augmentation approach proposed in this study is suitable for both rich- and low-resource scenarios and is effective in enhancing the training corpus to improve the performance of translation tasks.

73 sitasi en
arXiv Open Access 2025
A New Graph Grammar Formalism for Robust Syntactic Pattern Recognition

Peter Fletcher

I introduce a formalism for representing the syntax of recursively structured graph-like patterns. It does not use production rules, like a conventional graph grammar, but represents the syntactic structure in a more direct and declarative way. The grammar and the pattern are both represented as networks, and parsing is seen as the construction of a homomorphism from the pattern to the grammar. The grammars can represent iterative, hierarchical and nested recursive structure in more than one dimension. This supports a highly parallel style of parsing, in which all aspects of pattern recognition (feature detection, segmentation, parsing, filling in missing symbols, top-down and bottom-up inference) are integrated into a single process, to exploit the synergy between them. The emphasis of this paper is on underlying theoretical issues, but I also give some example runs to illustrate the error-tolerant parsing of complex recursively structured patterns of 50-1000 symbols, involving variability in geometric relationships, blurry and indistinct symbols, overlapping symbols, cluttered images, and erased patches.

en cs.FL, cs.CV
arXiv Open Access 2025
AnnoGram: An Annotative Grammar of Graphics Extension

Md Dilshadur Rahman, Md Rahat-uz- Zaman, Andrew McNutt et al.

Annotations are central to effective data communication, yet most visualization tools treat them as secondary constructs -- manually defined, difficult to reuse, and loosely coupled to the underlying visualization grammar. We propose a declarative extension to Wilkinson's Grammar of Graphics that reifies annotations as first-class design elements, enabling structured specification of annotation targets, types, and positioning strategies. To demonstrate the utility of our approach, we develop a prototype extension called Vega-Lite Annotation. Through comparison with eight existing tools, we show that our approach enhances expressiveness, reduces authoring effort, and enables portable, semantically integrated annotation workflows.

en cs.HC, cs.GR
arXiv Open Access 2024
Kajal: Extracting Grammar of a Source Code Using Large Language Models

Mohammad Jalili Torkamani

Understanding and extracting the grammar of a domain-specific language (DSL) is crucial for various software engineering tasks; however, manually creating these grammars is time-intensive and error-prone. This paper presents Kajal, a novel approach that automatically infers grammar from DSL code snippets by leveraging Large Language Models (LLMs) through prompt engineering and few-shot learning. Kajal dynamically constructs input prompts, using contextual information to guide the LLM in generating the corresponding grammars, which are iteratively refined through a feedback-driven approach. Our experiments show that Kajal achieves 60% accuracy with few-shot learning and 45% without it, demonstrating the significant impact of few-shot learning on the tool's effectiveness. This approach offers a promising solution for automating DSL grammar extraction, and future work will explore using smaller, open-source LLMs and testing on larger datasets to further validate Kajal's performance.

en cs.SE, cs.AI
arXiv Open Access 2024
Counting on General Run-Length Grammars

Gonzalo Navarro, Alejandro Pacheco

We introduce a data structure for counting pattern occurrences in texts compressed with any run-length context-free grammar. Our structure uses space proportional to the grammar size and counts the occurrences of a pattern of length $m$ in a text of length $n$ in time \(O(m\log^{2+ε} n)\), for any constant \(ε> 0\) chosen at indexing time. This is the first solution to an open problem posed by Christiansen et al.~[ACM TALG 2020] and enhances our abilities for computation over compressed data; we give an example application.

en cs.DS
DOAJ Open Access 2024
Translators’ and interpreters’ engagement with professional development in Australia: An analysis of key factors

Jim Hlavac, Shani Tobias, Lola Sundin et al.

Professional development aims to facilitate the maintenance, improvement and broadening of knowledge and skills, and has become a standard or even compulsory component of professional practice for many occupational groups. This paper traces the uptake of professional development amongst certified translators and interpreters in Australia, where in 2014 it was introduced as a requirement for newly-certified practitioners only, and in 2019 for all holders of translation or interpreting certification from the national certifying authority. Based on responses gained from a sample of 3,268 practitioners, we report high uptake overall with little variation according to level of qualifications. Slightly lower uptake rates are recorded only amongst ‘newcomers’ with less experience while for all others, it is consistently high. Lower uptake rates are recorded amongst those who work 1-10 hours per week and those earning up to A$10,000 per year compared to others working more hours and those earning more. A desire for more work does not co-occur with elevated levels of PD uptake. The data presented reflects the reported experiences of those who had already been required to engage with PD, those for whom this requirement was new with a three-year time window to undertake PD, as well as those for whom it still remains optional. These findings contribute to our understanding of PD uptake amongst a professional group whose engagement with post-certification training has been under-studied. Findings may inform relevant stakeholders in other countries considering measures to arrest atrophy and extend the skill sets of practising translators and interpreters.

Translating and interpreting
DOAJ Open Access 2024
Marc Angenot : La rhétorique à l’épreuve de l’histoire des idées

Marc Angenot, Marianne Doury, Théophile Robineau

Dans l’entretien qu’il a accordé à Marianne Doury et Théophile Robineau, Marc Angenot revient sur la place de la rhétorique dans ses travaux. Rappelant que l’ancienne rhétorique se fonde essentiellement sur le modèle judiciaire, il en souligne l’intérêt, mais aussi les limites, pour explorer le discours social dont il cherche à en rendre compte. Il en reprend la perspective globalisante, mobilisant ethos, pathos et logos, dont la prise en compte conjuguée est nécessaire à la compréhension des idées et de la façon dont elles sont portées et discutées dans la société. Mais il insiste sur la nécessaire prise en compte de l’inscription du discours social dans une histoire de plus ou moins long terme, condition à son intelligibilité. Le fait de se donner le discours social comme objet de recherche oblige également à reconsidérer la notion de situation telle que l’envisage traditionnellement la rhétorique, et à redéfinir le regard porté sur la question de la persuasion.

Style. Composition. Rhetoric
arXiv Open Access 2023
UI Layout Generation with LLMs Guided by UI Grammar

Yuwen Lu, Ziang Tong, Qinyi Zhao et al.

The recent advances in Large Language Models (LLMs) have stimulated interest among researchers and industry professionals, particularly in their application to tasks concerning mobile user interfaces (UIs). This position paper investigates the use of LLMs for UI layout generation. Central to our exploration is the introduction of UI grammar -- a novel approach we proposed to represent the hierarchical structure inherent in UI screens. The aim of this approach is to guide the generative capacities of LLMs more effectively and improve the explainability and controllability of the process. Initial experiments conducted with GPT-4 showed the promising capability of LLMs to produce high-quality user interfaces via in-context learning. Furthermore, our preliminary comparative study suggested the potential of the grammar-based approach in improving the quality of generative results in specific aspects.

en cs.HC, cs.AI
arXiv Open Access 2023
Closure Properties of General Grammars -- Formally Verified

Martin Dvorak, Jasmin Blanchette

We formalized general (i.e., type-0) grammars using the Lean 3 proof assistant. We defined basic notions of rewrite rules and of words derived by a grammar, and used grammars to show closure of the class of type-0 languages under four operations: union, reversal, concatenation, and the Kleene star. The literature mostly focuses on Turing machine arguments, which are possibly more difficult to formalize. For the Kleene star, we could not follow the literature and came up with our own grammar-based construction.

DOAJ Open Access 2023
Sartrean Ethics and Emotive Nuisance in Kafkaesque World

Muhammad Adnan Akbar, Maria Farooq Maan

This study investigates the integration of Sartrean ethical principles in Kafka's literary works and challenges the usefulness of existentialist ethics. Sartre's Notebook for An Ethics (1983) argues that existentialism is a practical ethical theory that challenges the separation of theoretical and practical aspects. Warnock echoes this in Existential Ethics (1967). By examining key works by Sartre, including Existentialism and Humanism (1946) and Being and Nothingness (1943), the research explores the fundamental concepts of Sartrean ethics, which include freedom, bad faith, responsibility, and anguish. Sartre rejects absolute values, prioritizing subjectivity while acknowledging authenticity and good faith. Although Kierkegaard and Heidegger do not explicitly address existential ethics, they contribute to ethical concerns. The study employs a qualitative d phenomenological approach, emphasizing Ricoeur's hermeneutic phenomenology. The theoretical framework is based on Sartre's Ethics and Emotive Nuisance concepts. The epistemological position aligns with Heidegger's interpretive technique. In the Kafkaesque World, characters struggle with existential perplexity amid modern-age horrors, exploring the traumas of existence. The research develops systematic frameworks to understand the ethical standpoint of this world, where characters face entanglement in chaotic existential paraphernalia and emotive nuisance. Emotions linked to existential ethics are examined to clarify the impact of emotion on ethical conduct.

English literature, Language. Linguistic theory. Comparative grammar
DOAJ Open Access 2023
Introduction special issue: marking the truth: a cross-linguistic approach to verum

Jordanoska Izabela, Kocher Anna, Bendezú-Araujo Raúl

This special issue focuses on the theoretical and empirical underpinnings of truth-marking. The names that have been used to refer to this phenomenon include, among others, counter-assertive focus, polar(ity) focus, verum focus, emphatic polarity or simply verum. This terminological variety is suggestive of the wide range of ideas and conceptions that characterizes this research field. This collection aims to get closer to the core of what truly constitutes verum. We want to expand the empirical base and determine the common and diverging properties of truth-marking in the languages of the world. The objective is to set a theoretical and empirical baseline for future research on verum and related phenomena.

Language. Linguistic theory. Comparative grammar
S2 Open Access 2001
Noncoding RNA gene detection using comparative sequence analysis

Elena Rivas, S. Eddy

BackgroundNoncoding RNA genes produce transcripts that exert their function without ever producing proteins. Noncoding RNA gene sequences do not have strong statistical signals, unlike protein coding genes. A reliable general purpose computational genefinder for noncoding RNA genes has been elusive.ResultsWe describe a comparative sequence analysis algorithm for detecting novel structural RNA genes. The key idea is to test the pattern of substitutions observed in a pairwise alignment of two homologous sequences. A conserved coding region tends to show a pattern of synonymous substitutions, whereas a conserved structural RNA tends to show a pattern of compensatory mutations consistent with some base-paired secondary structure. We formalize this intuition using three probabilistic "pair-grammars": a pair stochastic context free grammar modeling alignments constrained by structural RNA evolution, a pair hidden Markov model modeling alignments constrained by coding sequence evolution, and a pair hidden Markov model modeling a null hypothesis of position-independent evolution. Given an input pairwise sequence alignment (e.g. from a BLASTN comparison of two related genomes) we classify the alignment into the coding, RNA, or null class according to the posterior probability of each class.ConclusionsWe have implemented this approach as a program, QRNA, which we consider to be a prototype structural noncoding RNA genefinder. Tests suggest that this approach detects noncoding RNA genes with a fair degree of reliability.

545 sitasi en Medicine, Biology
arXiv Open Access 2022
Nested Sequents for Intuitionistic Grammar Logics via Structural Refinement

Tim S. Lyon

Intuitionistic grammar logics fuse constructive and multi-modal reasoning while permitting the use of converse modalities, serving as a generalization of standard intuitionistic modal logics. In this paper, we provide definitions of these logics as well as establish a suitable proof theory thereof. In particular, we show how to apply the structural refinement methodology to extract cut-free nested sequent calculi for intuitionistic grammar logics from their semantics. This method proceeds by first transforming the semantics of these logics into sound and complete labeled sequent systems, which we prove have favorable proof-theoretic properties such as syntactic cut-elimination. We then transform these labeled systems into nested sequent systems via the introduction of propagation rules and the elimination of structural rules. Our derived proof systems are then put to use, whereby we prove the conservativity of intuitionistic grammar logics over their modal counterparts, establish the general undecidability of these logics, and recognize a decidable subclass, referred to as "simple" intuitionistic grammar logics.

en cs.LO, cs.DM
DOAJ Open Access 2022
Reconceptualizing breaks in translation: Breaking down or breaking through?

Erik Angelone, Álvaro Marín García

Various observable on-screen translator behaviors, such as extended pauses in activity, mouse hovering, cycling through tabs/windows, and different kinds of scrolling, all common occurrences during task completion, have been regarded as potential problem indicators (cf. Angelone, 2018). Their presence is often attributed to a breakdown in declarative and/or procedural knowledge at a concrete problem nexus (Angelone and Shreve, 2011). Inspired by recent translation process research on aspects of cognitive ergonomics, pause-related cognitive rhythms (Muñoz and Cardona, 2018), and Kussmaul’s notion of parallel activity in the translation process (1995), we re- examine such phenomena through a different lens. We propose these phenomena may represent the loci of volitional, potentially strategic breaks rather than problem indicators per se. That is, the breaks observed are not necessarily linked to specific problems, but rather to subjects’ cognitive resource management. Our findings suggest that apparently random behaviors, seemingly unrelated to the task, generally have a positive impact on performance from both process and product perspectives. We refer to these breaks as instances of cognitive suspension, and, based on our findings, propose that translators engage in them as a refresh mechanism when performance has either waned or runs the risk of doing so. We start by examining cognitive suspension in terms of types and scope. This is followed by an empirical analysis of its direct impact on translation performance, as established by number of errors, number of generated characters, and number of typos within established windows (areas of interest) that precede and follow its occurrence.

Translating and interpreting
DOAJ Open Access 2022
La reflexión metalingüística como recurso humorístico en el discurso televisivo

Lucía Luque Nadal

El discurso humorístico utiliza diferentes estrategias y recursos lingüísticos para suscitar la risa de los televidentes. Este artículo estudia la reflexión metalingüística como elemento creador y cohesionador del discurso humorístico. Para ello, se analiza un corpus de 50 ejemplos extraídos de una conocida serie de humor española y se clasifican los ejemplos de reflexión metalingüística según los elementos que operan en cada uno de ellos (polisemias, etimologías populares, eufemismos, expresiones metalingüísticas, etc.). En términos de creatividad lingüística, se comprueba que la reflexión metalingüística opera como un recurso que muestra un elevado dinamismo creador de situaciones humorísticas.

Romanic languages, Philology. Linguistics
arXiv Open Access 2021
Pregroup Grammars, their Syntax and Semantics

Mehrnoosh Sadrzadeh

Pregroup grammars were developed in 1999 and stayed Lambek's preferred algebraic model of grammar. The set-theoretic semantics of pregroups, however, faces an ambiguity problem. In his latest book, Lambek suggests that this problem might be overcome using finite dimensional vector spaces rather than sets. What is the right notion of composition in this setting, direct sum or tensor product of spaces?

arXiv Open Access 2021
Shape Inference and Grammar Induction for Example-based Procedural Generation

Gillis Hermans, Thomas Winters, Luc De Raedt

Designers increasingly rely on procedural generation for automatic generation of content in various industries. These techniques require extensive knowledge of the desired content, and about how to actually implement such procedural methods. Algorithms for learning interpretable generative models from example content could alleviate both difficulties. We propose SIGI, a novel method for inferring shapes and inducing a shape grammar from grid-based 3D building examples. This interpretable grammar is well-suited for co-creative design. Applied to Minecraft buildings, we show how the shape grammar can be used to automatically generate new buildings in a similar style.

en cs.AI, cs.LG

Halaman 13 dari 185149