Hasil untuk "Comparative grammar"

Menampilkan 20 dari ~3705302 hasil · dari DOAJ, arXiv, Semantic Scholar, CrossRef

JSON API
arXiv Open Access 2026
Precise Robot Command Understanding Using Grammar-Constrained Large Language Models

Xinyun Huo, Raghav Gnanasambandam, Xinyao Zhang

Human-robot collaboration in industrial settings requires precise and reliable communication to enhance operational efficiency. While Large Language Models (LLMs) understand general language, they often lack the domain-specific rigidity needed for safe and executable industrial commands. To address this gap, this paper introduces a novel grammar-constrained LLM that integrates a grammar-driven Natural Language Understanding (NLU) system with a fine-tuned LLM, which enables both conversational flexibility and the deterministic precision required in robotics. Our method employs a two-stage process. First, a fine-tuned LLM performs high-level contextual reasoning and parameter inference on natural language inputs. Second, a Structured Language Model (SLM) and a grammar-based canonicalizer constrain the LLM's output, forcing it into a standardized symbolic format composed of valid action frames and command elements. This process guarantees that generated commands are valid and structured in a robot-readable JSON format. A key feature of the proposed model is a validation and feedback loop. A grammar parser validates the output against a predefined list of executable robotic actions. If a command is invalid, the system automatically generates corrective prompts and re-engages the LLM. This iterative self-correction mechanism allows the model to recover from initial interpretation errors to improve system robustness. We evaluate our grammar-constrained hybrid model against two baselines: a fine-tuned API-based LLM and a standalone grammar-driven NLU model. Using the Human Robot Interaction Corpus (HuRIC) dataset, we demonstrate that the hybrid approach achieves superior command validity, which promotes safer and more effective industrial human-robot collaboration.

en cs.RO, cs.CL
arXiv Open Access 2025
On the sensitivity of CDAWG-grammars

Hiroto Fujimaru, Shunsuke Inenaga

The compact directed acyclic word graphs (CDAWG) [Blumer et al. 1987] of a string is the minimal compact automaton that recognizes all the suffixes of the string. CDAWGs are known to be useful for various string tasks including text pattern searching, data compression, and pattern discovery. The CDAWG-grammar [Belazzougui & Cunial 2017] is a grammar-based text compression based on the CDAWG. In this paper, we prove that the CDAWG-grammar size $g$ can increase by at most an additive factor of $4e + 4$ than the original after any single-character edit operation is performed on the input string, where $e$ denotes the number of edges in the corresponding CDAWG before the edit.

en cs.DS
arXiv Open Access 2025
Directed Graph Grammars for Sequence-based Learning

Michael Sun, Orion Foo, Gang Liu et al.

Directed acyclic graphs (DAGs) are a class of graphs commonly used in practice, with examples that include electronic circuits, Bayesian networks, and neural architectures. While many effective encoders exist for DAGs, it remains challenging to decode them in a principled manner, because the nodes of a DAG can have many different topological orders. In this work, we propose a grammar-based approach to constructing a principled, compact and equivalent sequential representation of a DAG. Specifically, we view a graph as derivations over an unambiguous grammar, where the DAG corresponds to a unique sequence of production rules. Equivalently, the procedure to construct such a description can be viewed as a lossless compression of the data. Such a representation has many uses, including building a generative model for graph generation, learning a latent space for property prediction, and leveraging the sequence representational continuity for Bayesian Optimization over structured data. Code is available at https://github.com/shiningsunnyday/induction.

en cs.LG
arXiv Open Access 2025
Memelang: An Axial Grammar for LLM-Generated Vector-Relational Queries

Bri Holt

Structured generation for LLM tool use highlights the value of compact DSL intermediate representations (IRs) that can be emitted directly and parsed deterministically. This paper introduces axial grammar: linear token sequences that recover multi-dimensional structure from the placement of rank-specific separator tokens. A single left-to-right pass assigns each token a coordinate in an n-dimensional grid, enabling deterministic parsing without parentheses or clause-heavy surface syntax. This grammar is instantiated in Memelang, a compact query language intended as an LLM-emittable IR whose fixed coordinate roles map directly to table/column/value slots. Memelang supports coordinate-stable relative references, parse-time variable binding, and implicit context carry-forward to reduce repetition in LLM-produced queries. It also encodes grouping, aggregation, and ordering via inline tags on value terms, allowing grouped execution plans to be derived in one streaming pass over the coordinate-indexed representation. Provided are a reference lexer/parser and a compiler that emits parameterized PostgreSQL SQL (optionally using pgvector operators).

en cs.DB
DOAJ Open Access 2024
Rhetorical Strategies of Counteracting Conspiracy-based Dissent on COVID-19 Vaccines: the #ThinkBeforeSharing Institutional Campaign

Roberta Martina Zagarella, Marco Annoni

This paper aims to explore how institutions may counteract conspiracy theories using appropriate discursive resources. We use a rhetorical approach to analyze the first European information campaign launched in 2020 to counteract conspiracy theories about COVID-19 vaccines. On this basis, we advance a series of practical recommendations for institutions to counteract conspiracy theories through information campaigns.

Style. Composition. Rhetoric
arXiv Open Access 2024
Evolving Algebraic Multigrid Methods Using Grammar-Guided Genetic Programming

Dinesh Parthasarathy, Wayne Bradford Mitchell, Harald Köstler

Multigrid methods despite being known to be asymptotically optimal algorithms, depend on the careful selection of their individual components for efficiency. Also, they are mostly restricted to standard cycle types like V-, F-, and W-cycles. We use grammar rules to generate arbitrary-shaped cycles, wherein the smoothers and their relaxation weights are chosen independently at each step within the cycle. We call this a flexible multigrid cycle. These flexible cycles are used in Algebraic Multigrid (AMG) methods with the help of grammar rules and optimized using genetic programming. The flexible AMG methods are implemented in the software library of hypre, and the programs are optimized separately for two cases: a standalone AMG solver for a 3D anisotropic problem and an AMG preconditioner with conjugate gradient for a multiphysics code. We observe that the optimized flexible cycles provide higher efficiency and better performance than the standard cycle types.

en cs.CE, cs.AI
arXiv Open Access 2024
From MTEB to MTOB: Retrieval-Augmented Classification for Descriptive Grammars

Albert Kornilov, Tatiana Shavrina

Recent advances in language modeling have demonstrated significant improvements in zero-shot capabilities, including in-context learning, instruction following, and machine translation for extremely under-resourced languages (Tanzer et al., 2024). However, many languages with limited written resources rely primarily on formal descriptions of grammar and vocabulary. In this paper, we introduce a set of benchmarks to evaluate how well models can extract and classify information from the complex descriptions found in linguistic grammars. We present a Retrieval-Augmented Generation (RAG)-based approach that leverages these descriptions for downstream tasks such as machine translation. Our benchmarks encompass linguistic descriptions for 248 languages across 142 language families, focusing on typological features from WALS and Grambank. This set of benchmarks offers the first comprehensive evaluation of language models' in-context ability to accurately interpret and extract linguistic features, providing a critical resource for scaling NLP to low-resource languages. The code and data are publicly available at \url{https://github.com/al-the-eigenvalue/RAG-on-grammars}.

en cs.CL
DOAJ Open Access 2023
هەولەك بۆ ئێكگرتنا ڕێنڤیسا زمان ێ كوردی

Huda Salih

ڕێنڤیسا ئێكگرتی ب فاكتەرەكێ‌ گرنگێ ستانداربونا زمانی دهێتە هەژمارتن.هەر چەندە كێشەیێن نڤیسینێ‌ د هەمی زماناندا هەنه، بەلێ‌ دڤێت چارەیەك ژ لایێ كەسانێن بسپورڤە بۆ بهێتە دیتن.د ئەڤێ‌ ڤەكۆلینێدا چەند پیتەیەكێن، كو كێشە و بۆچونێن جیاواز ل سەر هەین هاتینە دەستنیشانكرن ل گەل بەرچاڤكرنا بۆچونێن چەندین زمانڤانان و هينانا چەندين بەلگەیێن پێدڤی، كو ڤەكۆلەرێ‌ ئێخستینە بەرچاڤ و ل دوماهیێ بریار ل سەر هەبون یان نەبونا ئەوێ‌ پیتێ‌ د ڕێنڤیسا زمانێ‌ كوردیدا هاتیە دان. د ئەنجامێن ئەڤێ‌ ڤەكۆلینێدا خۆیا دبیت، كو هندەك ژ ئەوان پیتێن كێشە و بۆچونێن جیاواز ل سەر هەین، كو وەك داتا بۆ ڤەكۆلینێ‌ هاتینە هەلبژارتن، ئەو ژى: < ح ،ڕ، ع، غ، ڵ، ¡، وو> د ئەنجاماندا خۆیا دبیت، كو د ڕینڤیسا زمانێ‌ كوردیدا هەبونا خۆ هەیه و ئاخڤتنكەرێن كورد هەست ب هەبونا ئەوێ‌ دكەن و د گەلەك پەیڤاندا دووبارە دبیتەڤە، هەروەسا ب پيتەكا زمانێ‌ كوردی هاتیە دانان، چونكی د گەلەك پەیڤاندا ب گوهۆڕینا ئەوێ‌ ل گەل پیتا دبیتە ئەگەرێ‌ گوهۆڕینا واتایێ‌. دو پیتێن ڕەسەنێن زمانێ‌ كوردی نینن و ب ئەلوفۆنێن هەڤ هاتینە دانان. ڕێنڤیسا زمانێ‌ كوردی پێدڤی ب بزروكێ‌ هەیە، كو هێمایەك بۆ بهێتە دانان، چونكی برگا زمانێ‌ كوردی بێ‌ ڤاول دروست نابیت، هەروەسا گوهۆڕینا ئەوێ‌ ب پیتەكا دیتر دبیتە ئەگەرێ‌ گوهۆڕینا واتایێ‌، هەروەسا دشێین هەر پەیڤەكا تێدا بێ ئاریشە ب ئێك یێ بنڤیسین، چونكی چ گوهۆرینەكا واتایی روینادەت.

Indo-Iranian languages and literature, Language. Linguistic theory. Comparative grammar
arXiv Open Access 2023
MHG-GNN: Combination of Molecular Hypergraph Grammar with Graph Neural Network

Akihiro Kishimoto, Hiroshi Kajino, Masataka Hirose et al.

Property prediction plays an important role in material discovery. As an initial step to eventually develop a foundation model for material science, we introduce a new autoencoder called the MHG-GNN, which combines graph neural network (GNN) with Molecular Hypergraph Grammar (MHG). Results on a variety of property prediction tasks with diverse materials show that MHG-GNN is promising.

en cs.LG
DOAJ Open Access 2022
Profesionalización del ejercicio del periodismo hipermedia en graduados de la Universidad Central «Marta Abreu» de Las Villas

Grettel Rodríguez Bazán, Anniel Hernández Villa, Mariela Díaz Ramírez

RESUMEN: Introducción: El ejercicio del periodismo en el espacio digital toma protagonismo en la actualidad por lo que el presente artículo tiene como objetivo: Proponer una estrategia para elevar la calidad del periodismo hipermedia que realizan los egresados de la carrera de Periodismo de la Universidad Central «Marta Abreu» de Las Villas. Métodos: En la obtención de los resultados se aplicaron métodos como la teoría fundamentada y la fenomenología. Resultados: Se diseñan cambios en tres aristas fundamentales: formación de profesional, medios de comunicación y formación postgraduada que contribuyen a elevar el nivel de los egresados. Conclusiones: La importancia de aumentarla calidad del periodismo hipermedia en egresados conlleva a una mirada y acciones interdisciplinares.

Philology. Linguistics, Language. Linguistic theory. Comparative grammar
DOAJ Open Access 2022
The Preconceptual Basis of Noun Class (Gender)

Patrik Bye

Noun class is widely seen as “standing out” from other morphosyntactic categories in having a basis in ontological beliefs, or a ‘semantic core’. The consequence of this view is that noun classes in natural languages frequently do not cohere semantically. Here I motivate an aspectual alternative according to which noun class is grounded in low-level cognitive processes including the detection of agency and sex- related cues (including shape/size) and ‘mode’ of attention. This suggests a way of bringing noun class more into line with the perspectivizing contribution of morphosyntactic features in general.

Language. Linguistic theory. Comparative grammar
DOAJ Open Access 2022
The Image of Chelyabinsk in the 20th century British Media Discourse (1901-1950)

Olga A. Solopova, Natalya N. Koshkarova, Igor V. Sibiriakov

The paper studies the evolution of the image of Chelyabinsk in the 20th century British media discourse. The research proves relevant as it involves both linguistic and historical analyses; it aims at retrospective study of the evolution of the image of the foreign city in British media discourse over a large time span. A wide range of methods is employed in the study: comparative, diachronic, cognitive-matrix, cognitive-discursive methods, source study, and content analysis. The source of the data is a digitized archive of British historical media texts. The authors fixed nine variations of the city name. The frequency of modeling the image of Chelyabinsk is dissimilar: it is rather high at the beginning of the century, declines in the second decade, reaches its minimum in 1921-1930, and rises again in the subsequent decades, which is explained by the interest of the British media to industrialization and the events of World War II. Most of the newspapers and magazines that modelled the image of Chelyabinsk were published in the capitals and large industrial centres, which is explained by the peculiarities of British print media, a higher level of education of large cities residents, and Britains economic interests in Russia / the Soviet Union. The significant difference in the images of Chelyabinsk across the time is in their emotive load: negative images of the beginning of the century are contrasted to positive images generated in the latest time span.

Language. Linguistic theory. Comparative grammar, Semantics
DOAJ Open Access 2022
“Україна понад усе” чи “Пушкин – это наше всё”? (українська освіта та російська література)

Ю. Ковбасенко

У статті розглянуто три типи війни, гібридно нав’язані Росією Україні: «війну Ареса», «війну Афіни» та «війну Аполлона», а також причини, хід та результати використання культури й літератури як блискотливої вуалі для маскування імперської суті «русского мира», «кривой рожи России» (М. Гоголь). Проаналізовано, чому, попри широкомасштабну агресію, розв’язану 24.02.2022 Російською Федерацією проти України, а також доведені міжнародними судами воєнні злочини рашистів, у широких колах світової спільноти й навіть українського суспільства все ще зберігається пієтет до Росії та її «великої» літератури й культури. Зроблено висновок, що смертельно небезпечне (як «яблуко Білосніжки») поєднання, з одного боку, естетичної привабливості, та, з другого боку, імперської ідеологічної токсичності (надто в умовах повномасштабної військової агресії РФ, коли навіть сама російська мова, що нею написано згадані твори, для мільйонів українців стала тригером) робить російську літературу абсолютно неприйнятною для вивчення в ЗСО України. Простежено витоки й етапи закорінення міфу про «світову велич» російської літератури та зроблено обґрунтований висновок, що значна питома вага російських творів у наших шкільних програмах є не свідченням їхнього гаданого «світового» ідейно-естетичного рівня, а важкою спадщиною імперської (у т. ч. радянської) доби, коли в колонізованих Московією землях (зокрема й в Україні) відбувалася примусова асиміляція («обрусение») населення, тож усе російське насаджувалося силоміць. Спрогонозовано ефективні шляхи корекції стратегій вивчення російської літератури в ЗВО України: інтенсивне застосування постколоніальної інтерпретації та компаративного аналізу, оновлення кола досліджуваних літературних творів та застосування нових підходів до вивчення біографій письменників. Зазначено, що стратегічний поворот у викладанні російської літератури та культури в ЗВО України вимагатиме титанічних зусиль не лише освітян, а й усієї держави, розробки та реалізації спеціальної цільової державної програми. Ключові слова: «війна Аполлона», гібридна війна, глорифікація імперського літературного канону, імперський міф, національна ідентичність, постколоніальні студії, «рашизм», семантична (парадигмальна) війна, «трубадури Імперії».

Discourse analysis, Computational linguistics. Natural language processing
S2 Open Access 2020
Redeeming the ‘ordinary working class’

Robbie Shilliam

Critical responses to the rise of right-wing populism in the Western world have done much to draw attention to the racialization of moral economies. However, it is not only remarkable that class has returned to the grammar of politics as an intractably racialized category – the white-working-class; it is just as remarkable that the racialized moral opprobrium of the underclass has given way rhetorically and ideologically to a racialized moral commitment to social justice for the ordinary working class. More critical reflection is needed to understand the way in which the imagined constituency of populist lore is worthy of redemption not just by virtue of their whiteness but of their white-ordinary-working-classness. This article presents a series of key comparative moments in debates over social security and welfare provision – past and present – that demonstrate the centrality of labour’s ‘cooperative spirit’ for political-philosophical debates over social security and welfare. To this end, the author methodologically sketches out a set of political ‘grammars’ that through these debates frame ethical quandaries and policy prescriptions. The author argues that such political grammars have variously apprehended the orderly or disorderly nature of labour’s cooperative spirit by reference to patriarchal and eugenic filiations. While the debates interrogated here have no doubt utilized different terms and categories, their grammars resonate strongly. This gives cause to consider that the redemption of the ‘ordinary’ working class requires the segregation of that class along imperial – and postimperial – lines of heredity.

7 sitasi en Sociology
S2 Open Access 2020
Holding the mirror up to converted languages: Two grammars, one lexicon

Felicity Meakins, Rob Pensalfini

Aims and objectives/purpose/research questions: This article describes an unusual result of language contact occurring in North-Central Australia, where extensive long-term contact between speakers of the genetically unrelated Jingulu and Mudburra has resulted in a high degree of lexical borrowing, with little if any change to syntactic or morphological structure in either language. What is particularly unusual about this borrowing is that it is bidirectional, with almost equal numbers of words being borrowed from Jingulu into Mudburra as vice versa. This situation mirrors that of converted languages, where two varieties have come to share a grammar through contact, but retain separate lexicons. Design/methodology/approach: We use a comparative database to establish the direction of noun borrowings between these languages. Data and analysis: The comparative database consists of 871 nouns shared by Jingulu and Mudburra and also includes 571 corresponding nouns from a number of geographically and phylogenetically neighbouring languages: Wambaya, Gurindji, Jaminjung, Jaru, Warlmanpa and Warumungu. Findings/conclusions: We show that for nouns alone, Mudburra and Jingulu share 65% of their forms. What makes the Jingulu-Mudburra situation even more unusual is the relatively balanced bidirectional nature of borrowings, with 32% of shared nouns originating in Mudburra and 24.5% from Jingulu (for the remaining 43.5%, direction of borrowing could not be determined). Originality: We suggest that that this situation of bidirectional borrowing represents a hitherto unreported type of language hybridisation scenario, which we dub ‘lexical convergence’. Significance/implications: We claim that this unusual situation is the result of long-term cohabitation of the two groups, a shared cultural life and relative socio-political equality between the two groups. We venture that these may be requisite to the sort of extensive bidirectional borrowing and maintenance of individual grammatical systems found in lexical convergence more generally.

7 sitasi en Computer Science
S2 Open Access 2020
The Oxford Handbook of African Languages

This book provides a comprehensive overview of current research in African languages, drawing on insights from anthropological linguistics, typology, historical and comparative linguistics, and sociolinguistics. Africa is believed to host at least one-third of the world’s languages, usually classified into four phyla—Niger-Congo, Afro-Asiatic, Nilo-Saharan, and Khoisan—which are then subdivided into further families and subgroupings. This volume explores all aspects of research in the field, beginning with chapters that cover the major domains of grammar and comparative approaches. Later parts provide overviews of the phyla and subfamilies, alongside grammatical sketches of eighteen representative African languages of diverse genetic affiliation. The volume additionally explores multiple other topics relating to African languages and linguistics, with a particular focus on extralinguistic issues: language, cognition, and culture, including color terminology and conversation analysis; language and society, including language contact and endangerment; language and history; and language and orature.

Halaman 26 dari 185266