Hasil "Comparative grammar"

arXiv Open Access 2025

Robust Probabilistic Load Forecasting for a Single Household: A Comparative Study from SARIMA to Transformers on the REFIT Dataset

Midhun Manoj

Probabilistic forecasting is essential for modern risk management, allowing decision-makers to quantify uncertainty in critical systems. This paper tackles this challenge using the volatile REFIT household dataset, which is complicated by a large structural data gap. We first address this by conducting a rigorous comparative experiment to select a Seasonal Imputation method, demonstrating its superiority over linear interpolation in preserving the data's underlying distribution. We then systematically evaluate a hierarchy of models, progressing from classical baselines (SARIMA, Prophet) to machine learning (XGBoost) and advanced deep learning architectures (LSTM). Our findings reveal that classical models fail to capture the data's non-linear, regime-switching behavior. While the LSTM provided the most well-calibrated probabilistic forecast, the Temporal Fusion Transformer (TFT) emerged as the superior all-round model, achieving the best point forecast accuracy (RMSE 481.94) and producing safer, more cautious prediction intervals that effectively capture extreme volatility.

en cs.LG

Detail Sumber

arXiv Open Access 2025

Explain-then-Process: Using Grammar Prompting to Enhance Grammatical Acceptability Judgments

Russell Scheinberg, Ameeta Agrawal, Amber Shore et al.

Large language models (LLMs) can explain grammatical rules, yet they often fail to apply those rules when judging sentence acceptability. We present "grammar prompting", an explain-then-process paradigm: a large LLM first produces a concise explanation of the relevant syntactic phenomenon, then that explanation is fed back as additional context to the target model -- either an LLM or a smaller language model (SLM) -- before deciding which sentence of a minimal pair is grammatical. On the English BLiMP, Chinese SLING, and Russian RuBLiMP benchmarks, this simple prompt design yields substantial improvements over strong baselines across many syntactic phenomena. Feeding an LLM's metalinguistic explanation back to the target model bridges the gap between knowing a rule and using it. On SLMs, grammar prompting alone trims the average LLM-SLM accuracy gap by about 20%, and when paired with chain-of-thought, by 56% (13.0 pp -> 5.8 pp), all at negligible cost. The lightweight, language-agnostic cue lets low-cost SLMs approach frontier-LLM performance in multilingual settings.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2024

The Computational Learning of Construction Grammars: State of the Art and Prospective Roadmap

Jonas Doumen, Veronica Juliana Schmalz, Katrien Beuls et al.

This paper documents and reviews the state of the art concerning computational models of construction grammar learning. It brings together prior work on the computational learning of form-meaning pairings, which has so far been studied in several distinct areas of research. The goal of this paper is threefold. First of all, it aims to synthesise the variety of methodologies that have been proposed to date and the results that have been obtained. Second, it aims to identify those parts of the challenge that have been successfully tackled and reveal those that require further research. Finally, it aims to provide a roadmap which can help to boost and streamline future research efforts on the computational learning of large-scale, usage-based construction grammars.

en cs.CL, cs.AI

Detail Sumber

DOAJ Open Access 2024

Samuel Beckett, Romanzi, Teatro e Televisione

Flora de Giovanni

Recensione del volume di Samuel Beckett Romanzi, Teatro e Televisione a cura di Gabriele Frasca.

Geography. Anthropology. Recreation, Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

DOAJ Open Access 2024

Non-verbal predications in Zarma

Mahamane L. Abdoulaye, Salimata Abdoulrazikou

This article presents new findings in the use of copulas nôo ‘be’ and ti ‘be’ in non-verbal predications in Zarma (Songhay; Niger, Nigeria). Based on some exclusive contexts of use and some morphosyntactic criteria, the article distinguishes a basic type of predication with one term “NP + nôo” used in deictic identification (e. g.: Abdù nôo ‘it’s Abdu’) and a type of predication with two terms “NP1 + NP2 + nôo” used in nominal predications and equative sentences (e. g.: wodìn Abdù nôo ‘that is Abdu’). The article shows that copula ti replaces copula nôo in negation but also in non-verbal focus constructions where it is generally preceded by the subordinating conjunction kà/gà and very likely marks the presupposed part of the sentence (e. g.: [Muusà nôo] kà ti càwkŏo ‘[it’s Musa] who is a student’). Finally, the article shows that in Zarma, it is the one-term predication “NP + nôo” that is recruited to mark focus-fronted constituents of verbal and non-verbal predications, thus confirming an observation already made about other languages.

Computational linguistics. Natural language processing, Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

arXiv Open Access 2023

Minimalist Grammar: Construction without Overgeneration

Isidor Konrad Maier, Johannes Kuhn, Jesse Beisegel et al.

In this paper we give instructions on how to write a minimalist grammar (MG). In order to present the instructions as an algorithm, we use a variant of context free grammars (CFG) as an input format. We can exclude overgeneration, if the CFG has no recursion, i.e. no non-terminal can (indirectly) derive to a right-hand side containing itself. The constructed MGs utilize licensors/-ees as a special way of exception handling. A CFG format for a derivation $A\_eats\_B\mapsto^* peter\_eats\_apples$, where $A$ and $B$ generate noun phrases, normally leads to overgeneration, e.\,g., $i\_eats\_apples$. In order to avoid overgeneration, a CFG would need many non-terminal symbols and rules, that mainly produce the same word, just to handle exceptions. In our MGs however, we can summarize CFG rules that produce the same word in one item and handle exceptions by a proper distribution of licensees/-ors. The difficulty with this technique is that in most generations the majority of licensees/-ors is not needed, but still has to be triggered somehow. We solve this problem with $ε$-items called \emph{adapters}.

en cs.CL

Detail Sumber

DOAJ Open Access 2023

Writing the Self: Interior Voyage in 19th Century French Travel Writing

Andi Mustofa, Wening Udasmoro, Sri Ratna Saktimulya

Travel is a momentum to look inside that influences the travelers' existence, along with meeting and interacting with others. The self as a traveler experiences internal dynamics reflected in the travel writings. This paper analyzes five French travel writings to reveal the self-construction of travelers who explored the East in the 19th century. The analysis results show that travelers’ self-construction is divided into Enlightenment or Romantic subjects and true travelers or travelers as tourists. The Enlightenment subject prioritizes facts and empirical knowledge outside of the self for the broader interest. In contrast, the Romantic subject puts forward subjective and emotional attitudes in dealing with and narrating others used for personal gain. True travelers look for difficulties in other places to prove themselves in conquering the challenges. Travelers as tourists try to avoid the obstacles by seeking safety and comfort during the trip. The East as a travel destination is a space that offers difficulties in constructing and legitimizing the traveler's self-image with the attributes that society expects, such as courage and persistence. The five French travelers, both Enlightenment or Romantic subjects and true travelers or tourists, had various knowledge of the others due to factors such as the purpose of the trip, profession, social status, and duration of the trip. Knowledge of the others and self-disclosure narrated in travel writings manifest the French travelers’ power to control and manage themselves and represent the Other.

Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

arXiv Open Access 2022

Marginal Inference queries in Hidden Markov Models under context-free grammar constraints

Reda Marzouk, Colin de La Higuera

The primary use of any probabilistic model involving a set of random variables is to run inference and sampling queries on it. Inference queries in classical probabilistic models is concerned by the computation of marginal or conditional probabilities of events given as an input. When the probabilistic model is sequential, more sophisticated marginal inference queries involving complex grammars may be of interest in fields such as computational linguistics and NLP. In this work, we address the question of computing the likelihood of context-free grammars (CFGs) in Hidden Markov Models (HMMs). We provide a dynamic algorithm for the exact computation of the likelihood for the class of unambiguous context-free grammars. We show that the problem is NP-Hard, even with the promise that the input CFG has a degree of ambiguity less than or equal to 2. We then propose a fully polynomial randomized approximation scheme (FPRAS) algorithm to approximate the likelihood for the case of polynomially-bounded ambiguous CFGs.

en cs.AI, cs.FL

Detail Sumber

arXiv Open Access 2022

Czech Grammar Error Correction with a Large and Diverse Corpus

Jakub Náplava, Milan Straka, Jana Straková et al.

We introduce a large and diverse Czech corpus annotated for grammatical error correction (GEC) with the aim to contribute to the still scarce data resources in this domain for languages other than English. The Grammar Error Correction Corpus for Czech (GECCC) offers a variety of four domains, covering error distributions ranging from high error density essays written by non-native speakers, to website texts, where errors are expected to be much less common. We compare several Czech GEC systems, including several Transformer-based ones, setting a strong baseline to future research. Finally, we meta-evaluate common GEC metrics against human judgements on our data. We make the new Czech GEC corpus publicly available under the CC BY-SA 4.0 license at http://hdl.handle.net/11234/1-4639 .

en cs.CL

Detail DOI Sumber

arXiv Open Access 2022

Faster and Better Grammar-based Text-to-SQL Parsing via Clause-level Parallel Decoding and Alignment Loss

Kun Wu, Lijie Wang, Zhenghua Li et al.

Grammar-based parsers have achieved high performance in the cross-domain text-to-SQL parsing task, but suffer from low decoding efficiency due to the much larger number of actions for grammar selection than that of tokens in SQL queries. Meanwhile, how to better align SQL clauses and question segments has been a key challenge for parsing performance. Therefore, this paper proposes clause-level parallel decoding and alignment loss to enhance two high-performance grammar-based parsers, i.e., RATSQL and LGESQL. Experimental results of two parsers show that our method obtains consistent improvements both in accuracy and decoding speed.

en cs.CL

Detail Sumber

arXiv Open Access 2021

Something Old, Something New: Grammar-based CCG Parsing with Transformer Models

Stephen Clark

This report describes the parsing problem for Combinatory Categorial Grammar (CCG), showing how a combination of Transformer-based neural models and a symbolic CCG grammar can lead to substantial gains over existing approaches. The report also documents a 20-year research program, showing how NLP methods have evolved over this time. The staggering accuracy improvements provided by neural models for CCG parsing can be seen as a reflection of the improvements seen in NLP more generally. The report provides a minimal introduction to CCG and CCG parsing, with many pointers to the relevant literature. It then describes the CCG supertagging problem, and some recent work from Tian et al. (2020) which applies Transformer-based models to supertagging with great effect. I use this existing model to develop a CCG multitagger, which can serve as a front-end to an existing CCG parser. Simply using this new multitagger provides substantial gains in parsing accuracy. I then show how a Transformer-based model from the parsing literature can be combined with the grammar-based CCG parser, setting a new state-of-the-art for the CCGbank parsing task of almost 93% F-score for labelled dependencies, with complete sentence accuracies of over 50%.

en cs.CL

Detail Sumber

arXiv Open Access 2021

Structured Context and High-Coverage Grammar for Conversational Question Answering over Knowledge Graphs

Pierre Marion, Paweł Krzysztof Nowak, Francesco Piccinno

We tackle the problem of weakly-supervised conversational Question Answering over large Knowledge Graphs using a neural semantic parsing approach. We introduce a new Logical Form (LF) grammar that can model a wide range of queries on the graph while remaining sufficiently simple to generate supervision data efficiently. Our Transformer-based model takes a JSON-like structure as input, allowing us to easily incorporate both Knowledge Graph and conversational contexts. This structured input is transformed to lists of embeddings and then fed to standard attention layers. We validate our approach, both in terms of grammar coverage and LF execution accuracy, on two publicly available datasets, CSQA and ConvQuestions, both grounded in Wikidata. On CSQA, our approach increases the coverage from $80\%$ to $96.2\%$, and the LF execution accuracy from $70.6\%$ to $75.6\%$, with respect to previous state-of-the-art results. On ConvQuestions, we achieve competitive results with respect to the state-of-the-art.

en cs.CL

Detail Sumber

arXiv Open Access 2021

Expectation-based Minimalist Grammars

Cristiano Chesi

Expectation-based Minimalist Grammars (e-MGs) are simplified versions of the (Conflated) Minimalist Grammars, (C)MGs, formalized by Stabler (Stabler, 2011, 2013, 1997) and Phase-based Minimalist Grammars, PMGs (Chesi, 2005, 2007; Stabler, 2011). The crucial simplification consists of driving structure building only by relying on lexically encoded categorial top-down expectations. The commitment on a top-down derivation (as in e-MGs and PMGs, as opposed to (C)MGs, Chomsky, 1995; Stabler, 2011) allows us to define a core derivation that should be the same in both parsing and generation (Momma & Phillips, 2018).

en cs.CL, cs.CC

Detail Sumber

DOAJ Open Access 2021

Didattica teatrale e acquisizione linguistica: un’analisi metodologica

Peppoloni, Diana

The present study, conducted in a secondary school in Perugia (Italy), aimed to verify the benefits of the theatrical teaching methodology on the acquisition of a foreign language, in this case Spanish. The project, funded with the Three-year Plan of the Arts, took place for 20 hours over a school term, involving 23 students in the theatrical teaching group and 30 learners in the control group. The paper describes thoroughly the original experimental protocol in all its different phases, in order to encode it and make it replicable. It also provides the results obtained, measured quantitatively and statistically through the administration of a language pre-test, at the beginning of the project, and a final post-test.

Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

arXiv Open Access 2020

Knowledge Graph Embedding for Link Prediction: A Comparative Analysis

Andrea Rossi, Donatella Firmani, Antonio Matinata et al.

Knowledge Graphs (KGs) have found many applications in industry and academic settings, which in turn, have motivated considerable research efforts towards large-scale information extraction from a variety of sources. Despite such efforts, it is well known that even state-of-the-art KGs suffer from incompleteness. Link Prediction (LP), the task of predicting missing facts among entities already a KG, is a promising and widely studied task aimed at addressing KG incompleteness. Among the recent LP techniques, those based on KG embeddings have achieved very promising performances in some benchmarks. Despite the fast growing literature in the subject, insufficient attention has been paid to the effect of the various design choices in those methods. Moreover, the standard practice in this area is to report accuracy by aggregating over a large number of test facts in which some entities are over-represented; this allows LP methods to exhibit good performance by just attending to structural properties that include such entities, while ignoring the remaining majority of the KG. This analysis provides a comprehensive comparison of embedding-based LP methods, extending the dimensions of analysis beyond what is commonly available in the literature. We experimentally compare effectiveness and efficiency of 16 state-of-the-art methods, consider a rule-based baseline, and report detailed analysis over the most popular benchmarks in the literature.

en cs.LG, cs.DB

Detail DOI Sumber

arXiv Open Access 2020

Logical foundations for hybrid type-logical grammars

Richard Moot, Symon Stevens-Guille

This paper explores proof-theoretic aspects of hybrid type-logical grammars , a logic combining Lambek grammars with lambda grammars. We prove some basic properties of the calculus, such as normalisation and the subformula property and also present both a sequent and a proof net calculus for hybrid type-logical grammars. In addition to clarifying the logical foundations of hybrid type-logical grammars, the current study opens the way to variants and extensions of the original system, including but not limited to a non-associative version and a multimodal version incorporating structural rules and unary modes.

en cs.CL, math.LO

Detail Sumber

DOAJ Open Access 2020

Apprendre le breton, est-ce faire « communauté » ?

Hugues Pentecouteau, Pierre Servain

Often, the term ‘community’ is used to refer to a social group and the links that exist and are created between the individuals who are part of it. However, this notion is open to interpretation depending on the nature of these links. Indeed, it can be used as much to stigmatise as to valorise a social group depending on the context and the way in which the group is perceived. In this article, we propose to discuss this notion in the context of adult Breton language learning by examining whether learning this language involves the creation of a linguistic, cultural and/or political community. The study was based on longitudinal participant observation and informal interviews conducted from 2012 to 2016 during language courses offered by KEAV (Kamp Etrekeltiek Ar Vrezhonegerion, inter-Celtic camp of Breton speakers).

Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

DOAJ Open Access 2020

”Hårt arbete har gjort mig till den jag är”– pengar och moraliskt värde som pedagogiska aspekter i två Kalle Anka-album

Lars Wallner

Kalle Anka & Co har hyllats för sitt berättande och sina karaktärer, samtidigt som serien kritiserats för att vara förmedlare av kulturimperialism. Denna artikel diskuterar två seriealbum skapade för användning i svensk grundskola, sammansättningar av tidigare publicerade serier. Då dessa samlingar är tänkta att användas i skola så kan samlingarna också förväntas gestalta värderingar i linje med detta. Som centrum för handlingen, eller för karaktärsmotivation, är pengar den enskilt mest förekommande faktorn i dessa berättelser. Nio berättelser har valts ut och analyseras utifrån 1) hur fattiga och rika karaktärer porträtteras genom bild och text och 2) hur pengar och materialism konstrueras som moraliska värden. Analysen visar hur värden om materialism och pengar representeras av olika karaktärer: Joakim von Anka (rikedom, snålhet), Kalle Anka (lättja, slösaktighet) och Knattarna (godhet, arbetsiver). Som representanter för barnläsaren skildras Knattarna som goda arbetare, där arbetet är sin egen belöning och essentiellt för god karaktär. Dock kan rikedom som kommer från hårt arbete (till exempel Joakims) associeras med omoral om inte pengarna används till goda syften. Vidare gestaltas vissa karaktärer som intrinsikalt fattiga, vilket kan ses som problematiskt om det inte diskuteras tillsammans med unga läsare. Här kan läraren vara avgörande för elevers möjlighet till texttolkning.

Education (General), Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

arXiv Open Access 2019

parboiled2: a macro-based approach for effective generators of parsing expressions grammars in Scala

Alexander A. Myltsev

In today's computerized world, parsing is ubiquitous. Developers parse logs, queries to databases and websites, programming and natural languages. When Java ecosystem maturity, concise syntax, and runtime speed matters, developers choose parboiled2 that generates grammars for parsing expression grammars (PEG). The following open source libraries have chosen parboiled2 for parsing facilities: - akka-http is the Streaming-first HTTP server/module of Lightbend Akka - Sangria is a Scala GraphQL implementation - http4s is a minimal, idiomatic Scala interface for HTTP - cornichon is Scala DSL for testing HTTP JSON API - scala-uri is a simple Scala library for building and parsing URIs The library uses a wide range of Scala facilities to provide required functionality. We also discuss the extensions to PEGs. In particular, we show the implementation of an internal Scala DSL that features intuitive syntax and semantics. We demonstrate how parboiled2 extensively uses Scala typing to verify DSL integrity. We also show the connections to inner structures of parboiled2, which can give the developer a better understanding of how to compose more effective grammars. Finally, we expose how a grammar is expanded with Scala Macros to an effective runtime code.

en cs.PL, cs.SE

Detail Sumber

arXiv Open Access 2019

Inducing Sparse Coding and And-Or Grammar from Generator Network

Xianglei Xing, Song-Chun Zhu, Ying Nian Wu

We introduce an explainable generative model by applying sparse operation on the feature maps of the generator network. Meaningful hierarchical representations are obtained using the proposed generative model with sparse activations. The convolutional kernels from the bottom layer to the top layer of the generator network can learn primitives such as edges and colors, object parts, and whole objects layer by layer. From the perspective of the generator network, we propose a method for inducing both sparse coding and the AND-OR grammar for images. Experiments show that our method is capable of learning meaningful and explainable hierarchical representations.

en cs.LG, cs.AI

Detail Sumber

Hasil untuk "Comparative grammar"