Rosa M. Rodríguez, Luis Martínez-López, F. Herrera
Hasil untuk "Comparative grammar"
Menampilkan 20 dari ~3705732 hasil · dari DOAJ, Semantic Scholar, CrossRef, arXiv
Matteo Filosa, Graziano Blasilli, Emilio Martino et al.
Modern data analysis requires speed for massive datasets. Progressive Data Analysis and Visualization (PDAV) emerged as a discipline to address this problem, providing fast response times while maintaining interactivity with controlled accuracy. Yet it remains difficult to implement and reproduce. To lower this barrier, we present ProVega, a Vega-Lite-based grammar that simplifies PDAV instrumentation for both simple visualizations and complex visual environments. Alongside it, we introduce Pro-Ex, an editor designed to streamline the creation and analysis of progressive solutions. We validated ProVega by reimplementing 11 exemplars from the literature-verified for fidelity by 39 users-and demonstrating its support for various progressive methods, including data-chunking, process-chunking, and mixed-chunking. An expert user study confirmed the efficacy of ProVega and the Pro-Ex environment in real-world tasks. ProVega, Pro-Ex, and all related materials are available at https://github.com/XAIber-lab/provega
Stefano Bannò, Penny Karanasou, Kate Knill et al.
Evaluating the grammatical competence of second language (L2) learners is essential both for providing targeted feedback and for assessing proficiency. To achieve this, we propose a novel framework leveraging the English Grammar Profile (EGP), a taxonomy of grammatical constructs mapped to the proficiency levels of the Common European Framework of Reference (CEFR), to detect learners' attempts at grammatical constructs and classify them as successful or unsuccessful. This detection can then be used to provide fine-grained feedback. Moreover, the grammatical constructs are used as predictors of proficiency assessment by using automatically detected attempts as predictors of holistic CEFR proficiency. For the selection of grammatical constructs derived from the EGP, rule-based and LLM-based classifiers are compared. We show that LLMs outperform rule-based methods on semantically and pragmatically nuanced constructs, while rule-based approaches remain competitive for constructs that rely purely on morphological or syntactic features and do not require semantic interpretation. For proficiency assessment, we evaluate both rule-based and hybrid pipelines and show that a hybrid approach combining a rule-based pre-filter with an LLM consistently yields the strongest performance. Since our framework operates on pairs of original learner sentences and their corrected counterparts, we also evaluate a fully automated pipeline using automatic grammatical error correction. This pipeline closely approaches the performance of semi-automated systems based on manual corrections, particularly for the detection of successful attempts at grammatical constructs. Overall, our framework emphasises learners' successful attempts in addition to unsuccessful ones, enabling positive, formative feedback and providing actionable insights into grammatical development.
Perla Elízabet Ventura Ramos, Jesús Zaratoga Martínez, Norma Yadira Memije Alarcón
Introducción: La educación es un derecho humano esencial y una herramienta poderosa para el desarrollo personal y comunitario. Sin embargo, las mujeres y niñas indígenas enfrentan múltiples desafíos que limitan su acceso al sistema educativo y afectan su desarrollo integral. Métodos: Este estudio propone acciones de sensibilización para contribuir al desarrollo de las mujeres y niñas indígenas en Atliaca, Guerrero, México. A través de un enfoque cualitativo y explicativo, se analizó su contexto sociocultural y educativo. Se realizaron entrevistas y encuestas a mujeres, niñas, autoridades educativas, docentes, padres y líderes comunitarios, utilizando procedimientos de estadística descriptiva para identificar patrones clave. Resultados: Se identificaron cuatro aspectos principales: la brecha de género y etnia en el acceso, permanencia y logros educativos; los factores sociales, culturales, económicos y políticos que limitan su derecho a la educación; las iniciativas actuales para promover su educación; y los beneficios asociados con su desarrollo educativo. Conclusiones: Se plantean acciones de sensibilización que visibilicen los beneficios de la educación, generen espacios de diálogo y colaboración, brinden orientación y asesoría, y promuevan programas de apoyo y becas. Estas acciones buscan garantizar un desarrollo integral para las mujeres y niñas indígenas.
Adrian Amed García Jardines
Introducción: Este artículo explora la relación entre el paisaje y la memoria cultural en el litoral del Centro Histórico Urbano de Santiago de Cuba, centrándose en la avenida Jesús Menéndez. Se analiza cómo las antiguas edificaciones industriales, vinculadas a la producción de ron y cerveza, forman parte de un patrimonio que integra lo urbano, lo natural y lo simbólico, reflejando la identidad histórica y cultural de la región. Métodos: A partir de la metodología de Espinosa Ocallaghan y Gómez Ortega (2022), se evalúa la integración paisajística mediante un análisis de unidades paisajísticas y áreas de percepción visual. Este enfoque permite identificar la calidad visual y la visibilidad del paisaje, así como su relación con los valores patrimoniales y culturales del área. Resultados: El estudio identifica un uso de suelo mixto, con escasez de servicios y tres unidades de paisaje de alto valor, afectadas por la falta de atención. Estas unidades permiten jerarquizar los elementos clave para futuras intervenciones que integren la conservación del patrimonio con el desarrollo comunitario. Conclusiones: El litoral de Santiago de Cuba es un espacio cultural e histórico, por lo que se propone un modelo sostenible que involucre a la comunidad para su conservación y valorización como recurso clave de identidad local.
Vlatka Dugački
Leon Bettscheider, Andreas Zeller
Generating effective test inputs for a software system requires that these inputs be valid, as they will otherwise be rejected without reaching actual functionality. In the absence of a specification for the input language, common test generation techniques rely on sample inputs, which are abstracted into matching grammars and/or evolved guided by test coverage. However, if sample inputs miss features of the input language, the chances of generating these features randomly are slim. In this work, we present the first technique for symbolically and automatically mining input grammars from the code of recursive descent parsers. So far, the complexity of parsers has made such a symbolic analysis challenging to impossible. Our realization of the symbolic parsing technique overcomes these challenges by (1) associating each parser function parse_ELEM() with a nonterminal <ELEM>; (2) limiting recursive calls and loop iterations, such that a symbolic analysis of parse_ELEM() needs to consider only a finite number of paths; and (3) for each path, create an expansion alternative for <ELEM>. Being purely static, symbolic parsing does not require seed inputs; as it mitigates path explosion, it scales to complex parsers. Our evaluation promises symbolic parsing to be highly accurate. Applied on parsers for complex languages such as TINY-C or JSON, our STALAGMITE implementation extracts grammars with an accuracy of 99--100%, widely improving over the state of the art despite requiring only the program code and no input samples. The resulting grammars cover the entire input space, allowing for comprehensive and effective test generation, reverse engineering, and documentation.
Edi Muškardin, Tamim Burgstaller
We present PAPNI, a passive automata learning algorithm capable of learning deterministic context-free grammars, which are modeled with visibly deterministic pushdown automata. PAPNI is a generalization of RPNI, a passive automata learning algorithm capable of learning regular languages from positive and negative samples. PAPNI uses RPNI as its underlying learning algorithm while assuming a priori knowledge of the visibly deterministic input alphabet, that is, the alphabet decomposition into symbols that push to the stack, pop from the stack, or do not affect the stack. In this paper, we show how passive learning of deterministic pushdown automata can be viewed as a preprocessing step of standard RPNI implementations. We evaluate the proposed approach on various deterministic context-free grammars found in the literature and compare the predictive accuracy of learned models with RPNI.
Jannik Olbrich
The Burrows-Wheeler Transform (BWT) serves as the basis for many important sequence indexes. On very large datasets (e.g. genomic databases), classical BWT construction algorithms are often infeasible because they usually need to have the entire dataset in main memory. Fortunately, such large datasets are often highly repetitive. It can thus be beneficial to compute the BWT from a compressed representation. We propose an algorithm for computing the BWT via the Lyndon straight-line program, a grammar based on the standard factorization of Lyndon words. Our algorithm can also be used to compute the extended BWT (eBWT) of a multiset of sequences. We empirically evaluate our implementation and find that we can compute the BWT and eBWT of very large datasets faster and/or with less memory than competing methods.
Jos C. M. Baeten, Bas Luttik
A classical theorem states that the set of languages given by a pushdown automaton coincides with the set of languages given by a context-free grammar. In previous work, we proved the pendant of this theorem in a setting with interaction: the set of processes given by a pushdown automaton coincides with the set of processes given by a finite guarded recursive specification over a process algebra with actions, choice, sequencing and guarded recursion, if and only if we add sequential value passing. In this paper, we look what happens if we consider parallel pushdown automata instead of pushdown automata, and a process algebra with parallelism instead of sequencing.
Garrett Tanzer, Mirac Suzgun, Eline Visser et al.
Large language models (LLMs) can perform impressive feats with in-context learning or lightweight finetuning. It is natural to wonder how well these models adapt to genuinely new tasks, but how does one find tasks that are unseen in internet-scale training sets? We turn to a field that is explicitly motivated and bottlenecked by a scarcity of web data: low-resource languages. In this paper, we introduce MTOB (Machine Translation from One Book), a benchmark for learning to translate between English and Kalamang -- a language with less than 200 speakers and therefore virtually no presence on the web -- using several hundred pages of field linguistics reference materials. This task framing is novel in that it asks a model to learn a language from a single human-readable book of grammar explanations, rather than a large mined corpus of in-domain data, more akin to L2 learning than L1 acquisition. We demonstrate that baselines using current LLMs are promising but fall short of human performance, achieving 44.7 chrF on Kalamang to English translation and 45.8 chrF on English to Kalamang translation, compared to 51.6 and 57.0 chrF by a human who learned Kalamang from the same reference materials. We hope that MTOB will help measure LLM capabilities along a new dimension, and that the methods developed to solve it could help expand access to language technology for underserved communities by leveraging qualitatively different kinds of data than traditional machine translation.
Benedikt Szmrecsanyi, Alexandra Engel
Abstract In this paper, we operationalize register differences at the intersection of formality and mode, and distinguish four broad register categories: spoken informal (conversations), spoken formal (parliamentary debates), written informal (blogs), and written formal (newspaper articles). We are specifically interested in the comparative probabilistic/variationist complexity of these registers – when speakers have grammatical choices, are the probabilistic grammars regulating these choices more or less complex in particular registers than in others? Based on multivariate modeling of richly annotated datasets covering three grammatical alternations in two languages (English and Dutch), we assess the complexity of probabilistic grammars by drawing on three criteria: (a) the number of constraints on variant choice, (b) the number of interactions between constraints, and (c) the relative importance of lexical conditioning. Analysis shows that contrary to theorizing in variationist sociolinguistics, probabilistic complexity differences between registers are not quantitatively simple: formal registers are consistently the most complex ones, while spoken registers are the least complex ones. The most complex register under study is written-formal quality newspaper writing. We submit that the complexity differentials we uncover are a function of acquisitional difficulty, of on-line processing limitations, and of normative pressures.
Alice Corr
This book examines how speakers of Ibero-Romance ‘do things’ with conversational units of language, paying particular attention to what they do with utterance-oriented elements such as vocatives, interjections, and particles; and to what they do with illocutionary complementizers, items attested cross-linguistically which look like, but do not behave like, subordinators. Taking the behaviour of conversation-oriented units of language as a window into the indexical nature of language, it argues that these items provide insight into how language-as-grammar builds the universe of discourse. By identifying the underlying unity in how different Ibero-Romance languages, alongside their Romance cousins and Latin ancestors, use grammar to refer—i.e. to connect our inner world to the one outside—, the book’s empirical arguments are underpinned by the philosophical position that the architecture of grammar is also the architecture of thought. The book thus brings together the recent flurry of work seeking to incorporate aspects of the context of the utterance into the syntax, a line of enquiry broadly founded on empirical considerations, with the pursuit of explanatory adequacy via a so-called ‘un-Cartesian’ grammar of reference. In so doing, it formalizes the intuition that language users do things not with words, but with grammar. The book brings new insight to the comparative morphosyntax of (Ibero-)Romance, particularly in its diatopic, diastrastic, and diamesic dimensions, and showcases the utility of careful descriptive work on this language family in advancing our empirical and conceptual understanding of the organization of grammar.
Yibin Zhang
English has become one of the compulsory subjects for students in China. As a foreign language, especially one whose grammatical structure is, in some sense, diverse from learners’ mother tongue, it requires teachers to research proper methods to present syntactic patterns for students’ sake. When teachers turn to linguistics, there are two well-known theories about syntax from different points of perspective. They are transformational-generative grammar, proposed by Chomsky, and systemic functional grammar by Halliday. Concerned that most beginners may be challenged to be exposed to a totally new language that embraces foreign cultures; hence, learners are supposed to start with what is called the most fundamental syntax---the five basic English sentence patterns. As for teachers, it is necessary to analyze those sentence patterns and come up with practical teaching methods so that they can help learner study more efficiently. In this sense, this essay is far too meaningful. This dissertation aims to reveal the potential relations between the two theories in analyzing the five sentences as part of the efforts to seek more appropriate ways of discussing English syntactic features. Also, hopefully, it may bring some enlightenment to teachers. The method this paper applied is comparative analysis. After the research, the two theories have their place in explaining different types of sentences.
Wenqing Zhu, Norihiro Yoshida, Toshihiro Kamiya et al.
For various reasons, programming languages continue to multiply and evolve. It has become necessary to have a multilingual clone detection tool that can easily expand supported programming languages and detect various code clones is needed. However, research on multilingual code clone detection has not received sufficient attention. In this study, we propose MSCCD (Multilingual Syntactic Code Clone Detector), a grammar pluggable code clone detection tool that uses a parser generator to generate a code block extractor for the target language. The extractor then extracts the semantic code blocks from a parse tree. MSCCD can detect Type-3 clones at various granularities. We evaluated MSCCD's language extensibility by applying MSCCD to 20 modern languages. Sixteen languages were perfectly supported, and the remaining four were provided with the same detection capabilities at the expense of execution time. We evaluated MSCCD's recall by using BigCloneEval and conducted a manual experiment to evaluate precision. MSCCD achieved equivalent detection performance equivalent to state-of-the-art tools.
Jiaheng Hu, Julian Whiman, Howie Choset
Robots have been used in all sorts of automation, and yet the design of robots remains mainly a manual task. We seek to provide design tools to automate the design of robots themselves. An important challenge in robot design automation is the large and complex design search space which grows exponentially with the number of components, making optimization difficult and sample inefficient. In this work, we present Grammar-guided Latent Space Optimization (GLSO), a framework that transforms design automation into a low-dimensional continuous optimization problem by training a graph variational autoencoder (VAE) to learn a mapping between the graph-structured design space and a continuous latent space. This transformation allows optimization to be conducted in a continuous latent space, where sample efficiency can be significantly boosted by applying algorithms such as Bayesian Optimization. GLSO guides training of the VAE using graph grammar rules and robot world space features, such that the learned latent space focus on valid robots and is easier for the optimization algorithm to explore. Importantly, the trained VAE can be reused to search for designs specialized to multiple different tasks without retraining. We evaluate GLSO by designing robots for a set of locomotion tasks in simulation, and demonstrate that our method outperforms related state-of-the-art robot design automation methods.
Soe Thandar Aung, N. Funabiki, Y. Syaifudin et al.
Nowadays, Java has been extensively adopted in practical IT systems as a reliable and portable objectoriented programming language. To encourage self-studies of Java programming, we have developed a Web-based Java Programming Learning Assistant system (JPLAS). JPLAS provides several types of exercises to cover different levels. However, any type does not question grammar concepts of a source code directly, although it can be the first step for novice students. In this paper, we propose a GrammarConcept Understanding Problem (GUP) as a new type in JPLAS. A GUP instance consists of a source code and a set of questions on grammar concepts or behaviors of the code. Each answer can be a number, a word, or a short sentence, whose correctness is marked through string matching with the correct one. We present the algorithm to automatically generate a GUP instance from a given source code by: 1) extracting the registered keywords in the code, 2) selecting the registered question corresponding to each keyword, and 3) detecting the data required in the correct answer from the code. As for evaluations, we first generate 20 GUP instances with a total of 99 questions from simple codes on fundamental Java grammar, and assign them to 100 university students in Indonesia. On the other hand, we additionally generate 8 instances with a total of 30 questions, and assign all the instances to 29 undergraduates in Myanmar as the comparative study. The results show that the proposal is effective to improve the performance of the students who are novices in Java programming.
V. Vysotska, Svitlana Holoshchuk, Roman Holoshchuk
K. Kabel, Mette Vedsgaard Christensen, L. Brok
ABSTRACT Studies exploring grammar teaching in first and foreign language subjects in Scandinavia are very rare. In this article, we present findings from a focused ethnographic study (Gramma3, 2018–2019) of grammar teaching practices in the three major first and foreign language subjects at lower-secondary level (age 13–15) in Denmark: Danish L1, English L2 and German L3, with data collected at seven schools. The dominance of traditional school grammar content in all three classrooms is one main finding. However, the approaches vary across the three subjects, mirrored also in different traditions and cultures for language learning within first and foreign language subjects. The co-existence of concurrent and even contradictory practices within each language subject is another main finding. Thus, the cross-curricular perspective of the present study leads to detailed findings suggesting new ways of understanding explicit grammar teaching in compulsory education. In this way, the study helps to shed light on an under-researched, yet key curricular content area in all three subjects, suggesting opportunities for cooperation between first and foreign language teachers. In turn, it contributes knowledge, which is valuable beyond the national context of the study, with the potential for comparative studies across borders.
L. Domínguez, María J. Arche
In this paper we argue that Bley-Vroman’s Comparative Fallacy, which warns against comparisons between native speakers and learners in second-language acquisition (SLA) research, is not justified on either theoretical or methodological grounds and should be abandoned as it contravenes the explanatory nature of SLA research. We argue that for SLA to be able to provide meaningful explanations, grammatical comparisons with a baseline (usually of native speakers although not always the case) are not only justified but necessary, a position which we call the ‘Comparative Logic’. The methodological choices assumed by this position ensure that interlanguage grammars are analysed in their own right and respecting their own principles. Related issues, such as why we focus on the native speaker and why investigating deficits in linguistic-cognitive SLA is essential in our field are discussed as well.
Halaman 37 dari 185287