Human-robot collaboration in industrial settings requires precise and reliable communication to enhance operational efficiency. While Large Language Models (LLMs) understand general language, they often lack the domain-specific rigidity needed for safe and executable industrial commands. To address this gap, this paper introduces a novel grammar-constrained LLM that integrates a grammar-driven Natural Language Understanding (NLU) system with a fine-tuned LLM, which enables both conversational flexibility and the deterministic precision required in robotics. Our method employs a two-stage process. First, a fine-tuned LLM performs high-level contextual reasoning and parameter inference on natural language inputs. Second, a Structured Language Model (SLM) and a grammar-based canonicalizer constrain the LLM's output, forcing it into a standardized symbolic format composed of valid action frames and command elements. This process guarantees that generated commands are valid and structured in a robot-readable JSON format. A key feature of the proposed model is a validation and feedback loop. A grammar parser validates the output against a predefined list of executable robotic actions. If a command is invalid, the system automatically generates corrective prompts and re-engages the LLM. This iterative self-correction mechanism allows the model to recover from initial interpretation errors to improve system robustness. We evaluate our grammar-constrained hybrid model against two baselines: a fine-tuned API-based LLM and a standalone grammar-driven NLU model. Using the Human Robot Interaction Corpus (HuRIC) dataset, we demonstrate that the hybrid approach achieves superior command validity, which promotes safer and more effective industrial human-robot collaboration.
The compact directed acyclic word graphs (CDAWG) [Blumer et al. 1987] of a string is the minimal compact automaton that recognizes all the suffixes of the string. CDAWGs are known to be useful for various string tasks including text pattern searching, data compression, and pattern discovery. The CDAWG-grammar [Belazzougui & Cunial 2017] is a grammar-based text compression based on the CDAWG. In this paper, we prove that the CDAWG-grammar size $g$ can increase by at most an additive factor of $4e + 4$ than the original after any single-character edit operation is performed on the input string, where $e$ denotes the number of edges in the corresponding CDAWG before the edit.
Directed acyclic graphs (DAGs) are a class of graphs commonly used in practice, with examples that include electronic circuits, Bayesian networks, and neural architectures. While many effective encoders exist for DAGs, it remains challenging to decode them in a principled manner, because the nodes of a DAG can have many different topological orders. In this work, we propose a grammar-based approach to constructing a principled, compact and equivalent sequential representation of a DAG. Specifically, we view a graph as derivations over an unambiguous grammar, where the DAG corresponds to a unique sequence of production rules. Equivalently, the procedure to construct such a description can be viewed as a lossless compression of the data. Such a representation has many uses, including building a generative model for graph generation, learning a latent space for property prediction, and leveraging the sequence representational continuity for Bayesian Optimization over structured data. Code is available at https://github.com/shiningsunnyday/induction.
Structured generation for LLM tool use highlights the value of compact DSL intermediate representations (IRs) that can be emitted directly and parsed deterministically. This paper introduces axial grammar: linear token sequences that recover multi-dimensional structure from the placement of rank-specific separator tokens. A single left-to-right pass assigns each token a coordinate in an n-dimensional grid, enabling deterministic parsing without parentheses or clause-heavy surface syntax. This grammar is instantiated in Memelang, a compact query language intended as an LLM-emittable IR whose fixed coordinate roles map directly to table/column/value slots. Memelang supports coordinate-stable relative references, parse-time variable binding, and implicit context carry-forward to reduce repetition in LLM-produced queries. It also encodes grouping, aggregation, and ordering via inline tags on value terms, allowing grouped execution plans to be derived in one streaming pass over the coordinate-indexed representation. Provided are a reference lexer/parser and a compiler that emits parameterized PostgreSQL SQL (optionally using pgvector operators).
This paper aims to explore how institutions may counteract conspiracy theories using appropriate discursive resources. We use a rhetorical approach to analyze the first European information campaign launched in 2020 to counteract conspiracy theories about COVID-19 vaccines. On this basis, we advance a series of practical recommendations for institutions to counteract conspiracy theories through information campaigns.
Dinesh Parthasarathy, Wayne Bradford Mitchell, Harald Köstler
Multigrid methods despite being known to be asymptotically optimal algorithms, depend on the careful selection of their individual components for efficiency. Also, they are mostly restricted to standard cycle types like V-, F-, and W-cycles. We use grammar rules to generate arbitrary-shaped cycles, wherein the smoothers and their relaxation weights are chosen independently at each step within the cycle. We call this a flexible multigrid cycle. These flexible cycles are used in Algebraic Multigrid (AMG) methods with the help of grammar rules and optimized using genetic programming. The flexible AMG methods are implemented in the software library of hypre, and the programs are optimized separately for two cases: a standalone AMG solver for a 3D anisotropic problem and an AMG preconditioner with conjugate gradient for a multiphysics code. We observe that the optimized flexible cycles provide higher efficiency and better performance than the standard cycle types.
Recent advances in language modeling have demonstrated significant improvements in zero-shot capabilities, including in-context learning, instruction following, and machine translation for extremely under-resourced languages (Tanzer et al., 2024). However, many languages with limited written resources rely primarily on formal descriptions of grammar and vocabulary. In this paper, we introduce a set of benchmarks to evaluate how well models can extract and classify information from the complex descriptions found in linguistic grammars. We present a Retrieval-Augmented Generation (RAG)-based approach that leverages these descriptions for downstream tasks such as machine translation. Our benchmarks encompass linguistic descriptions for 248 languages across 142 language families, focusing on typological features from WALS and Grambank. This set of benchmarks offers the first comprehensive evaluation of language models' in-context ability to accurately interpret and extract linguistic features, providing a critical resource for scaling NLP to low-resource languages. The code and data are publicly available at \url{https://github.com/al-the-eigenvalue/RAG-on-grammars}.
Akihiro Kishimoto, Hiroshi Kajino, Masataka Hirose
et al.
Property prediction plays an important role in material discovery. As an initial step to eventually develop a foundation model for material science, we introduce a new autoencoder called the MHG-GNN, which combines graph neural network (GNN) with Molecular Hypergraph Grammar (MHG). Results on a variety of property prediction tasks with diverse materials show that MHG-GNN is promising.
RESUMEN: Introducción: El ejercicio del periodismo en el espacio digital toma protagonismo en la actualidad por lo que el presente artículo tiene como objetivo: Proponer una estrategia para elevar la calidad del periodismo hipermedia que realizan los egresados de la carrera de Periodismo de la Universidad Central «Marta Abreu» de Las Villas. Métodos: En la obtención de los resultados se aplicaron métodos como la teoría fundamentada y la fenomenología. Resultados: Se diseñan cambios en tres aristas fundamentales: formación de profesional, medios de comunicación y formación postgraduada que contribuyen a elevar el nivel de los egresados. Conclusiones: La importancia de aumentarla calidad del periodismo hipermedia en egresados conlleva a una mirada y acciones interdisciplinares.
Philology. Linguistics, Language. Linguistic theory. Comparative grammar
Noun class is widely seen as “standing out” from other morphosyntactic categories in having a basis in ontological beliefs, or a ‘semantic core’. The consequence of this view is that noun classes in natural languages frequently do not cohere semantically. Here I motivate an aspectual alternative according to which noun class is grounded in low-level cognitive processes including the detection of agency and sex- related cues (including shape/size) and ‘mode’ of attention. This suggests a way of bringing noun class more into line with the perspectivizing contribution of morphosyntactic features in general.
Olga A. Solopova, Natalya N. Koshkarova, Igor V. Sibiriakov
The paper studies the evolution of the image of Chelyabinsk in the 20th century British media discourse. The research proves relevant as it involves both linguistic and historical analyses; it aims at retrospective study of the evolution of the image of the foreign city in British media discourse over a large time span. A wide range of methods is employed in the study: comparative, diachronic, cognitive-matrix, cognitive-discursive methods, source study, and content analysis. The source of the data is a digitized archive of British historical media texts. The authors fixed nine variations of the city name. The frequency of modeling the image of Chelyabinsk is dissimilar: it is rather high at the beginning of the century, declines in the second decade, reaches its minimum in 1921-1930, and rises again in the subsequent decades, which is explained by the interest of the British media to industrialization and the events of World War II. Most of the newspapers and magazines that modelled the image of Chelyabinsk were published in the capitals and large industrial centres, which is explained by the peculiarities of British print media, a higher level of education of large cities residents, and Britains economic interests in Russia / the Soviet Union. The significant difference in the images of Chelyabinsk across the time is in their emotive load: negative images of the beginning of the century are contrasted to positive images generated in the latest time span.
Language. Linguistic theory. Comparative grammar, Semantics
У статті розглянуто три типи війни, гібридно нав’язані Росією Україні: «війну Ареса», «війну Афіни» та «війну Аполлона», а також причини, хід та результати використання культури й літератури як блискотливої вуалі для маскування імперської суті «русского мира», «кривой рожи России» (М. Гоголь). Проаналізовано, чому, попри широкомасштабну агресію, розв’язану 24.02.2022 Російською Федерацією проти України, а також доведені міжнародними судами воєнні злочини рашистів, у широких колах світової спільноти й навіть українського суспільства все ще зберігається пієтет до Росії та її «великої» літератури й культури. Зроблено висновок, що смертельно небезпечне (як «яблуко Білосніжки») поєднання, з одного боку, естетичної привабливості, та, з другого боку, імперської ідеологічної токсичності (надто в умовах повномасштабної військової агресії РФ, коли навіть сама російська мова, що нею написано згадані твори, для мільйонів українців стала тригером) робить російську літературу абсолютно неприйнятною для вивчення в ЗСО України. Простежено витоки й етапи закорінення міфу про «світову велич» російської літератури та зроблено обґрунтований висновок, що значна питома вага російських творів у наших шкільних програмах є не свідченням їхнього гаданого «світового» ідейно-естетичного рівня, а важкою спадщиною імперської (у т. ч. радянської) доби, коли в колонізованих Московією землях (зокрема й в Україні) відбувалася примусова асиміляція («обрусение») населення, тож усе російське насаджувалося силоміць. Спрогонозовано ефективні шляхи корекції стратегій вивчення російської літератури в ЗВО України: інтенсивне застосування постколоніальної інтерпретації та компаративного аналізу, оновлення кола досліджуваних літературних творів та застосування нових підходів до вивчення біографій письменників. Зазначено, що стратегічний поворот у викладанні російської літератури та культури в ЗВО України вимагатиме титанічних зусиль не лише освітян, а й усієї держави, розробки та реалізації спеціальної цільової державної програми.
Ключові слова: «війна Аполлона», гібридна війна, глорифікація імперського літературного канону, імперський міф, національна ідентичність, постколоніальні студії, «рашизм», семантична (парадигмальна) війна, «трубадури Імперії».
Discourse analysis, Computational linguistics. Natural language processing
Critical responses to the rise of right-wing populism in the Western world have done much to draw attention to the racialization of moral economies. However, it is not only remarkable that class has returned to the grammar of politics as an intractably racialized category – the white-working-class; it is just as remarkable that the racialized moral opprobrium of the underclass has given way rhetorically and ideologically to a racialized moral commitment to social justice for the ordinary working class. More critical reflection is needed to understand the way in which the imagined constituency of populist lore is worthy of redemption not just by virtue of their whiteness but of their white-ordinary-working-classness. This article presents a series of key comparative moments in debates over social security and welfare provision – past and present – that demonstrate the centrality of labour’s ‘cooperative spirit’ for political-philosophical debates over social security and welfare. To this end, the author methodologically sketches out a set of political ‘grammars’ that through these debates frame ethical quandaries and policy prescriptions. The author argues that such political grammars have variously apprehended the orderly or disorderly nature of labour’s cooperative spirit by reference to patriarchal and eugenic filiations. While the debates interrogated here have no doubt utilized different terms and categories, their grammars resonate strongly. This gives cause to consider that the redemption of the ‘ordinary’ working class requires the segregation of that class along imperial – and postimperial – lines of heredity.
Aims and objectives/purpose/research questions: This article describes an unusual result of language contact occurring in North-Central Australia, where extensive long-term contact between speakers of the genetically unrelated Jingulu and Mudburra has resulted in a high degree of lexical borrowing, with little if any change to syntactic or morphological structure in either language. What is particularly unusual about this borrowing is that it is bidirectional, with almost equal numbers of words being borrowed from Jingulu into Mudburra as vice versa. This situation mirrors that of converted languages, where two varieties have come to share a grammar through contact, but retain separate lexicons. Design/methodology/approach: We use a comparative database to establish the direction of noun borrowings between these languages. Data and analysis: The comparative database consists of 871 nouns shared by Jingulu and Mudburra and also includes 571 corresponding nouns from a number of geographically and phylogenetically neighbouring languages: Wambaya, Gurindji, Jaminjung, Jaru, Warlmanpa and Warumungu. Findings/conclusions: We show that for nouns alone, Mudburra and Jingulu share 65% of their forms. What makes the Jingulu-Mudburra situation even more unusual is the relatively balanced bidirectional nature of borrowings, with 32% of shared nouns originating in Mudburra and 24.5% from Jingulu (for the remaining 43.5%, direction of borrowing could not be determined). Originality: We suggest that that this situation of bidirectional borrowing represents a hitherto unreported type of language hybridisation scenario, which we dub ‘lexical convergence’. Significance/implications: We claim that this unusual situation is the result of long-term cohabitation of the two groups, a shared cultural life and relative socio-political equality between the two groups. We venture that these may be requisite to the sort of extensive bidirectional borrowing and maintenance of individual grammatical systems found in lexical convergence more generally.
This book provides a comprehensive overview of current research in African languages, drawing on insights from anthropological linguistics, typology, historical and comparative linguistics, and sociolinguistics. Africa is believed to host at least one-third of the world’s languages, usually classified into four phyla—Niger-Congo, Afro-Asiatic, Nilo-Saharan, and Khoisan—which are then subdivided into further families and subgroupings. This volume explores all aspects of research in the field, beginning with chapters that cover the major domains of grammar and comparative approaches. Later parts provide overviews of the phyla and subfamilies, alongside grammatical sketches of eighteen representative African languages of diverse genetic affiliation. The volume additionally explores multiple other topics relating to African languages and linguistics, with a particular focus on extralinguistic issues: language, cognition, and culture, including color terminology and conversation analysis; language and society, including language contact and endangerment; language and history; and language and orature.