Hasil "Comparative grammar"

S2 Open Access 2013

A group decision making model dealing with comparative linguistic expressions based on hesitant fuzzy linguistic term sets

Rosa M. Rodríguez, Luis Martínez-López, F. Herrera

541 sitasi en Computer Science, Mathematics

Detail DOI Sumber

arXiv Open Access 2026

ProVega: A Grammar to Ease the Prototyping, Creation, and Reproducibility of Progressive Data Analysis and Visualization Solutions

Matteo Filosa, Graziano Blasilli, Emilio Martino et al.

Modern data analysis requires speed for massive datasets. Progressive Data Analysis and Visualization (PDAV) emerged as a discipline to address this problem, providing fast response times while maintaining interactivity with controlled accuracy. Yet it remains difficult to implement and reproduce. To lower this barrier, we present ProVega, a Vega-Lite-based grammar that simplifies PDAV instrumentation for both simple visualizations and complex visual environments. Alongside it, we introduce Pro-Ex, an editor designed to streamline the creation and analysis of progressive solutions. We validated ProVega by reimplementing 11 exemplars from the literature-verified for fidelity by 39 users-and demonstrating its support for various progressive methods, including data-chunking, process-chunking, and mixed-chunking. An expert user study confirmed the efficacy of ProVega and the Pro-Ex environment in real-world tasks. ProVega, Pro-Ex, and all related materials are available at https://github.com/XAIber-lab/provega

en cs.HC

Detail Sumber

arXiv Open Access 2026

Exploiting the English Grammar Profile for L2 grammatical analysis with LLMs

Stefano Bannò, Penny Karanasou, Kate Knill et al.

Evaluating the grammatical competence of second language (L2) learners is essential both for providing targeted feedback and for assessing proficiency. To achieve this, we propose a novel framework leveraging the English Grammar Profile (EGP), a taxonomy of grammatical constructs mapped to the proficiency levels of the Common European Framework of Reference (CEFR), to detect learners' attempts at grammatical constructs and classify them as successful or unsuccessful. This detection can then be used to provide fine-grained feedback. Moreover, the grammatical constructs are used as predictors of proficiency assessment by using automatically detected attempts as predictors of holistic CEFR proficiency. For the selection of grammatical constructs derived from the EGP, rule-based and LLM-based classifiers are compared. We show that LLMs outperform rule-based methods on semantically and pragmatically nuanced constructs, while rule-based approaches remain competitive for constructs that rely purely on morphological or syntactic features and do not require semantic interpretation. For proficiency assessment, we evaluate both rule-based and hybrid pipelines and show that a hybrid approach combining a rule-based pre-filter with an LLM consistently yields the strongest performance. Since our framework operates on pairs of original learner sentences and their corrected counterparts, we also evaluate a fully automated pipeline using automatic grammatical error correction. This pipeline closely approaches the performance of semi-automated systems based on manual corrections, particularly for the detection of successful attempts at grammatical constructs. Overall, our framework emphasises learners' successful attempts in addition to unsuccessful ones, enabling positive, formative feedback and providing actionable insights into grammatical development.

en cs.CL

Detail Sumber

DOAJ Open Access 2025

Educación y sensibilización: una propuesta para el desarrollo integral de mujeres y niñas indígenas

Perla Elízabet Ventura Ramos, Jesús Zaratoga Martínez, Norma Yadira Memije Alarcón

Introducción: La educación es un derecho humano esencial y una herramienta poderosa para el desarrollo personal y comunitario. Sin embargo, las mujeres y niñas indígenas enfrentan múltiples desafíos que limitan su acceso al sistema educativo y afectan su desarrollo integral. Métodos: Este estudio propone acciones de sensibilización para contribuir al desarrollo de las mujeres y niñas indígenas en Atliaca, Guerrero, México. A través de un enfoque cualitativo y explicativo, se analizó su contexto sociocultural y educativo. Se realizaron entrevistas y encuestas a mujeres, niñas, autoridades educativas, docentes, padres y líderes comunitarios, utilizando procedimientos de estadística descriptiva para identificar patrones clave. Resultados: Se identificaron cuatro aspectos principales: la brecha de género y etnia en el acceso, permanencia y logros educativos; los factores sociales, culturales, económicos y políticos que limitan su derecho a la educación; las iniciativas actuales para promover su educación; y los beneficios asociados con su desarrollo educativo. Conclusiones: Se plantean acciones de sensibilización que visibilicen los beneficios de la educación, generen espacios de diálogo y colaboración, brinden orientación y asesoría, y promuevan programas de apoyo y becas. Estas acciones buscan garantizar un desarrollo integral para las mujeres y niñas indígenas.

Philology. Linguistics, Language. Linguistic theory. Comparative grammar

Detail Sumber

DOAJ Open Access 2025

Paisaje y memoria cultural. El litoral histórico de Santiago de Cuba como espacio de identidad y patrimonio

Adrian Amed García Jardines

Introducción: Este artículo explora la relación entre el paisaje y la memoria cultural en el litoral del Centro Histórico Urbano de Santiago de Cuba, centrándose en la avenida Jesús Menéndez. Se analiza cómo las antiguas edificaciones industriales, vinculadas a la producción de ron y cerveza, forman parte de un patrimonio que integra lo urbano, lo natural y lo simbólico, reflejando la identidad histórica y cultural de la región. Métodos: A partir de la metodología de Espinosa Ocallaghan y Gómez Ortega (2022), se evalúa la integración paisajística mediante un análisis de unidades paisajísticas y áreas de percepción visual. Este enfoque permite identificar la calidad visual y la visibilidad del paisaje, así como su relación con los valores patrimoniales y culturales del área. Resultados: El estudio identifica un uso de suelo mixto, con escasez de servicios y tres unidades de paisaje de alto valor, afectadas por la falta de atención. Estas unidades permiten jerarquizar los elementos clave para futuras intervenciones que integren la conservación del patrimonio con el desarrollo comunitario. Conclusiones: El litoral de Santiago de Cuba es un espacio cultural e histórico, por lo que se propone un modelo sostenible que involucre a la comunidad para su conservación y valorización como recurso clave de identidad local.

Philology. Linguistics, Language. Linguistic theory. Comparative grammar

Detail Sumber

DOAJ Open Access 2025

IX. znanstveni skup Dani Andrije Štampara

Vlatka Dugački

Lexicography

Detail Sumber

arXiv Open Access 2025

Inferring Input Grammars from Code with Symbolic Parsing

Leon Bettscheider, Andreas Zeller

Generating effective test inputs for a software system requires that these inputs be valid, as they will otherwise be rejected without reaching actual functionality. In the absence of a specification for the input language, common test generation techniques rely on sample inputs, which are abstracted into matching grammars and/or evolved guided by test coverage. However, if sample inputs miss features of the input language, the chances of generating these features randomly are slim. In this work, we present the first technique for symbolically and automatically mining input grammars from the code of recursive descent parsers. So far, the complexity of parsers has made such a symbolic analysis challenging to impossible. Our realization of the symbolic parsing technique overcomes these challenges by (1) associating each parser function parse_ELEM() with a nonterminal <ELEM>; (2) limiting recursive calls and loop iterations, such that a symbolic analysis of parse_ELEM() needs to consider only a finite number of paths; and (3) for each path, create an expansion alternative for <ELEM>. Being purely static, symbolic parsing does not require seed inputs; as it mitigates path explosion, it scales to complex parsers. Our evaluation promises symbolic parsing to be highly accurate. Applied on parsers for complex languages such as TINY-C or JSON, our STALAGMITE implementation extracts grammars with an accuracy of 99--100%, widely improving over the state of the art despite requiring only the program code and no input samples. The resulting grammars cover the entire input space, allowing for comprehensive and effective test generation, reverse engineering, and documentation.

en cs.SE, cs.FL

Detail Sumber

arXiv Open Access 2025

Passive Model Learning of Visibly Deterministic Context-free Grammars

Edi Muškardin, Tamim Burgstaller

We present PAPNI, a passive automata learning algorithm capable of learning deterministic context-free grammars, which are modeled with visibly deterministic pushdown automata. PAPNI is a generalization of RPNI, a passive automata learning algorithm capable of learning regular languages from positive and negative samples. PAPNI uses RPNI as its underlying learning algorithm while assuming a priori knowledge of the visibly deterministic input alphabet, that is, the alphabet decomposition into symbols that push to the stack, pop from the stack, or do not affect the stack. In this paper, we show how passive learning of deterministic pushdown automata can be viewed as a preprocessing step of standard RPNI implementations. We evaluate the proposed approach on various deterministic context-free grammars found in the literature and compare the predictive accuracy of learned models with RPNI.

en cs.FL

Detail Sumber

arXiv Open Access 2025

Fast and memory-efficient BWT construction of repetitive texts using Lyndon grammars

Jannik Olbrich

The Burrows-Wheeler Transform (BWT) serves as the basis for many important sequence indexes. On very large datasets (e.g. genomic databases), classical BWT construction algorithms are often infeasible because they usually need to have the entire dataset in main memory. Fortunately, such large datasets are often highly repetitive. It can thus be beneficial to compute the BWT from a compressed representation. We propose an algorithm for computing the BWT via the Lyndon straight-line program, a grammar based on the standard factorization of Lyndon words. Our algorithm can also be used to compute the extended BWT (eBWT) of a multiset of sequences. We empirically evaluate our implementation and find that we can compute the BWT and eBWT of very large datasets faster and/or with less memory than competing methods.

en cs.DS

Detail Sumber

arXiv Open Access 2023

Parallel Pushdown Automata and Commutative Context-Free Grammars in Bisimulation Semantics (Extended Abstract)

Jos C. M. Baeten, Bas Luttik

A classical theorem states that the set of languages given by a pushdown automaton coincides with the set of languages given by a context-free grammar. In previous work, we proved the pendant of this theorem in a setting with interaction: the set of processes given by a pushdown automaton coincides with the set of processes given by a finite guarded recursive specification over a process algebra with actions, choice, sequencing and guarded recursion, if and only if we add sequential value passing. In this paper, we look what happens if we consider parallel pushdown automata instead of pushdown automata, and a process algebra with parallelism instead of sequencing.

en cs.LO

Detail DOI Sumber

arXiv Open Access 2023

A Benchmark for Learning to Translate a New Language from One Grammar Book

Garrett Tanzer, Mirac Suzgun, Eline Visser et al.

Large language models (LLMs) can perform impressive feats with in-context learning or lightweight finetuning. It is natural to wonder how well these models adapt to genuinely new tasks, but how does one find tasks that are unseen in internet-scale training sets? We turn to a field that is explicitly motivated and bottlenecked by a scarcity of web data: low-resource languages. In this paper, we introduce MTOB (Machine Translation from One Book), a benchmark for learning to translate between English and Kalamang -- a language with less than 200 speakers and therefore virtually no presence on the web -- using several hundred pages of field linguistics reference materials. This task framing is novel in that it asks a model to learn a language from a single human-readable book of grammar explanations, rather than a large mined corpus of in-domain data, more akin to L2 learning than L1 acquisition. We demonstrate that baselines using current LLMs are promising but fall short of human performance, achieving 44.7 chrF on Kalamang to English translation and 45.8 chrF on English to Kalamang translation, compared to 51.6 and 57.0 chrF by a human who learned Kalamang from the same reference materials. We hope that MTOB will help measure LLM capabilities along a new dimension, and that the methods developed to solve it could help expand access to language technology for underserved communities by leveraging qualitatively different kinds of data than traditional machine translation.

en cs.CL

Detail Sumber

S2 Open Access 2022

A variationist perspective on the comparative complexity of four registers at the intersection of mode and formality

Benedikt Szmrecsanyi, Alexandra Engel

Abstract In this paper, we operationalize register differences at the intersection of formality and mode, and distinguish four broad register categories: spoken informal (conversations), spoken formal (parliamentary debates), written informal (blogs), and written formal (newspaper articles). We are specifically interested in the comparative probabilistic/variationist complexity of these registers – when speakers have grammatical choices, are the probabilistic grammars regulating these choices more or less complex in particular registers than in others? Based on multivariate modeling of richly annotated datasets covering three grammatical alternations in two languages (English and Dutch), we assess the complexity of probabilistic grammars by drawing on three criteria: (a) the number of constraints on variant choice, (b) the number of interactions between constraints, and (c) the relative importance of lexical conditioning. Analysis shows that contrary to theorizing in variationist sociolinguistics, probabilistic complexity differences between registers are not quantitatively simple: formal registers are consistently the most complex ones, while spoken registers are the least complex ones. The most complex register under study is written-formal quality newspaper writing. We submit that the complexity differentials we uncover are a function of acquisitional difficulty, of on-line processing limitations, and of normative pressures.

10 sitasi en

Detail DOI Sumber

S2 Open Access 2022

The Grammar of the Utterance

Alice Corr

This book examines how speakers of Ibero-Romance ‘do things’ with conversational units of language, paying particular attention to what they do with utterance-oriented elements such as vocatives, interjections, and particles; and to what they do with illocutionary complementizers, items attested cross-linguistically which look like, but do not behave like, subordinators. Taking the behaviour of conversation-oriented units of language as a window into the indexical nature of language, it argues that these items provide insight into how language-as-grammar builds the universe of discourse. By identifying the underlying unity in how different Ibero-Romance languages, alongside their Romance cousins and Latin ancestors, use grammar to refer—i.e. to connect our inner world to the one outside—, the book’s empirical arguments are underpinned by the philosophical position that the architecture of grammar is also the architecture of thought. The book thus brings together the recent flurry of work seeking to incorporate aspects of the context of the utterance into the syntax, a line of enquiry broadly founded on empirical considerations, with the pursuit of explanatory adequacy via a so-called ‘un-Cartesian’ grammar of reference. In so doing, it formalizes the intuition that language users do things not with words, but with grammar. The book brings new insight to the comparative morphosyntax of (Ibero-)Romance, particularly in its diatopic, diastrastic, and diamesic dimensions, and showcases the utility of careful descriptive work on this language family in advancing our empirical and conceptual understanding of the organization of grammar.

3 sitasi en

Detail DOI Sumber

S2 Open Access 2022

Revelations on Grammar Teaching Based on an Analysis on Syntactic Structure of Transformational Generative Grammar and Metafunctions of Systemic Functional Grammar

Yibin Zhang

English has become one of the compulsory subjects for students in China. As a foreign language, especially one whose grammatical structure is, in some sense, diverse from learners’ mother tongue, it requires teachers to research proper methods to present syntactic patterns for students’ sake. When teachers turn to linguistics, there are two well-known theories about syntax from different points of perspective. They are transformational-generative grammar, proposed by Chomsky, and systemic functional grammar by Halliday. Concerned that most beginners may be challenged to be exposed to a totally new language that embraces foreign cultures; hence, learners are supposed to start with what is called the most fundamental syntax---the five basic English sentence patterns. As for teachers, it is necessary to analyze those sentence patterns and come up with practical teaching methods so that they can help learner study more efficiently. In this sense, this essay is far too meaningful. This dissertation aims to reveal the potential relations between the two theories in analyzing the five sentences as part of the efforts to seek more appropriate ways of discussing English syntactic features. Also, hopefully, it may bring some enlightenment to teachers. The method this paper applied is comparative analysis. After the research, the two theories have their place in explaining different types of sentences.

3 sitasi en

Detail DOI Sumber

arXiv Open Access 2022

MSCCD: Grammar Pluggable Clone Detection Based on ANTLR Parser Generation

Wenqing Zhu, Norihiro Yoshida, Toshihiro Kamiya et al.

For various reasons, programming languages continue to multiply and evolve. It has become necessary to have a multilingual clone detection tool that can easily expand supported programming languages and detect various code clones is needed. However, research on multilingual code clone detection has not received sufficient attention. In this study, we propose MSCCD (Multilingual Syntactic Code Clone Detector), a grammar pluggable code clone detection tool that uses a parser generator to generate a code block extractor for the target language. The extractor then extracts the semantic code blocks from a parse tree. MSCCD can detect Type-3 clones at various granularities. We evaluated MSCCD's language extensibility by applying MSCCD to 20 modern languages. Sixteen languages were perfectly supported, and the remaining four were provided with the same detection capabilities at the expense of execution time. We evaluated MSCCD's recall by using BigCloneEval and conducted a manual experiment to evaluate precision. MSCCD achieved equivalent detection performance equivalent to state-of-the-art tools.

en cs.SE

Detail DOI Sumber

arXiv Open Access 2022

GLSO: Grammar-guided Latent Space Optimization for Sample-efficient Robot Design Automation

Jiaheng Hu, Julian Whiman, Howie Choset

Robots have been used in all sorts of automation, and yet the design of robots remains mainly a manual task. We seek to provide design tools to automate the design of robots themselves. An important challenge in robot design automation is the large and complex design search space which grows exponentially with the number of components, making optimization difficult and sample inefficient. In this work, we present Grammar-guided Latent Space Optimization (GLSO), a framework that transforms design automation into a low-dimensional continuous optimization problem by training a graph variational autoencoder (VAE) to learn a mapping between the graph-structured design space and a continuous latent space. This transformation allows optimization to be conducted in a continuous latent space, where sample efficiency can be significantly boosted by applying algorithms such as Bayesian Optimization. GLSO guides training of the VAE using graph grammar rules and robot world space features, such that the learned latent space focus on valid robots and is easier for the optimization algorithm to explore. Importantly, the trained VAE can be reused to search for designs specialized to multiple different tasks without retraining. We evaluate GLSO by designing robots for a set of locomotion tasks in simulation, and demonstrate that our method outperforms related state-of-the-art robot design automation methods.

en cs.RO, cs.LG

Detail Sumber

S2 Open Access 2021

A Proposal of Grammar-Concept Understanding Problem in Java Programming Learning Assistant System

Soe Thandar Aung, N. Funabiki, Y. Syaifudin et al.

Nowadays, Java has been extensively adopted in practical IT systems as a reliable and portable objectoriented programming language. To encourage self-studies of Java programming, we have developed a Web-based Java Programming Learning Assistant system (JPLAS). JPLAS provides several types of exercises to cover different levels. However, any type does not question grammar concepts of a source code directly, although it can be the first step for novice students. In this paper, we propose a GrammarConcept Understanding Problem (GUP) as a new type in JPLAS. A GUP instance consists of a source code and a set of questions on grammar concepts or behaviors of the code. Each answer can be a number, a word, or a short sentence, whose correctness is marked through string matching with the correct one. We present the algorithm to automatically generate a GUP instance from a given source code by: 1) extracting the registered keywords in the code, 2) selecting the registered question corresponding to each keyword, and 3) detecting the data required in the correct answer from the code. As for evaluations, we first generate 20 GUP instances with a total of 99 questions from simple codes on fundamental Java grammar, and assign them to 100 university students in Indonesia. On the other hand, we additionally generate 8 instances with a total of 30 questions, and assign all the instances to 29 undergraduates in Myanmar as the comparative study. The results show that the proposal is effective to improve the performance of the students who are novices in Java programming.

29 sitasi en

Detail DOI Sumber

S2 Open Access 2021

A Comparative Analysis for English and Ukrainian Texts Processing Based on Semantics and Syntax Approach

V. Vysotska, Svitlana Holoshchuk, Roman Holoshchuk

15 sitasi en Computer Science

Detail Sumber

S2 Open Access 2021

A focused ethnographic study on grammar teaching practices across language subjects in schools

K. Kabel, Mette Vedsgaard Christensen, L. Brok

ABSTRACT Studies exploring grammar teaching in first and foreign language subjects in Scandinavia are very rare. In this article, we present findings from a focused ethnographic study (Gramma3, 2018–2019) of grammar teaching practices in the three major first and foreign language subjects at lower-secondary level (age 13–15) in Denmark: Danish L1, English L2 and German L3, with data collected at seven schools. The dominance of traditional school grammar content in all three classrooms is one main finding. However, the approaches vary across the three subjects, mirrored also in different traditions and cultures for language learning within first and foreign language subjects. The co-existence of concurrent and even contradictory practices within each language subject is another main finding. Thus, the cross-curricular perspective of the present study leads to detailed findings suggesting new ways of understanding explicit grammar teaching in compulsory education. In this way, the study helps to shed light on an under-researched, yet key curricular content area in all three subjects, suggesting opportunities for cooperation between first and foreign language teachers. In turn, it contributes knowledge, which is valuable beyond the national context of the study, with the potential for comparative studies across borders.

13 sitasi en Sociology

Detail DOI Sumber

S2 Open Access 2021

The ‘Comparative Logic’ and Why We Need to Explain Interlanguage Grammars

L. Domínguez, María J. Arche

In this paper we argue that Bley-Vroman’s Comparative Fallacy, which warns against comparisons between native speakers and learners in second-language acquisition (SLA) research, is not justified on either theoretical or methodological grounds and should be abandoned as it contravenes the explanatory nature of SLA research. We argue that for SLA to be able to provide meaningful explanations, grammatical comparisons with a baseline (usually of native speakers although not always the case) are not only justified but necessary, a position which we call the ‘Comparative Logic’. The methodological choices assumed by this position ensure that interlanguage grammars are analysed in their own right and respecting their own principles. Related issues, such as why we focus on the native speaker and why investigating deficits in linguistic-cognitive SLA is essential in our field are discussed as well.

4 sitasi en Medicine

Detail DOI Sumber

Hasil untuk "Comparative grammar"