Hasil untuk "Language. Linguistic theory. Comparative grammar"

Menampilkan 20 dari ~4432070 hasil · dari DOAJ, CrossRef, arXiv, Semantic Scholar

JSON API
arXiv Open Access 2026
Doc2Spec: Synthesizing Formal Programming Specifications from Natural Language via Grammar Induction

Shihao Xia, Mengting He, Haomin Jia et al.

Ensuring that API implementations and usage comply with natural language programming rules is critical for software correctness, security, and reliability. Formal verification can provide strong guarantees but requires precise specifications, which are difficult and costly to write manually. To address this challenge, we present Doc2Spec, a multi-agent framework that uses LLMs to automatically induce a specification grammar from natural-language rules and then generates formal specifications guided by the induced grammar. The grammar captures essential domain knowledge, constrains the specification space, and enforces consistent representations, thereby improving the reliability and quality of generated specifications. Evaluated on seven benchmarks across three programming languages, Doc2Spec outperforms a baseline without grammar induction and achieves competitive results against a technique with a manually crafted grammar, demonstrating the effectiveness of automated grammar induction for formalizing natural-language rules.

en cs.PL, cs.AI
arXiv Open Access 2025
Grammaticality Judgments in Humans and Language Models: Revisiting Generative Grammar with LLMs

Lars G. B. Johnsen

What counts as evidence for syntactic structure? In traditional generative grammar, systematic contrasts in grammaticality such as subject-auxiliary inversion and the licensing of parasitic gaps are taken as evidence for an internal, hierarchical grammar. In this paper, we test whether large language models (LLMs), trained only on surface forms, reproduce these contrasts in ways that imply an underlying structural representation. We focus on two classic constructions: subject-auxiliary inversion (testing recognition of the subject boundary) and parasitic gap licensing (testing abstract dependency structure). We evaluate models including GPT-4 and LLaMA-3 using prompts eliciting acceptability ratings. Results show that LLMs reliably distinguish between grammatical and ungrammatical variants in both constructions, and as such support that they are sensitive to structure and not just linear order. Structural generalizations, distinct from cognitive knowledge, emerge from predictive training on surface forms, suggesting functional sensitivity to syntax without explicit encoding.

en cs.CL
arXiv Open Access 2025
Breaking Physical and Linguistic Borders: Multilingual Federated Prompt Tuning for Low-Resource Languages

Wanru Zhao, Yihong Chen, Royson Lee et al.

Pre-trained large language models (LLMs) have become a cornerstone of modern natural language processing, with their capabilities extending across a wide range of applications and languages. However, the fine-tuning of multilingual LLMs, especially for low-resource languages, faces significant challenges arising from data-sharing restrictions (the physical border) and inherent linguistic differences (the linguistic border). These barriers hinder users of various languages, particularly those in low-resource regions, from fully benefiting from the advantages of LLMs. To address these challenges, we propose the Federated Prompt Tuning Paradigm for multilingual scenarios, which utilizes parameter-efficient fine-tuning while adhering to data sharing restrictions. We design a comprehensive set of experiments and analyze them using a novel notion of language distance to highlight the strengths of our paradigm: Even under computational constraints, our method not only improves data efficiency but also facilitates mutual enhancements across languages, particularly benefiting low-resource ones. Compared to traditional local cross-lingual transfer tuning methods, our approach achieves 6.9\% higher accuracy with improved data efficiency, and demonstrates greater stability and generalization. These findings underscore the potential of our approach to promote social equality and champion linguistic diversity, ensuring that no language is left behind.

en cs.CL
arXiv Open Access 2025
A Comparative Approach to Assessing Linguistic Creativity of Large Language Models and Humans

Anca Dinu, Andra-Maria Florescu, Alina Resceanu

The following paper introduces a general linguistic creativity test for humans and Large Language Models (LLMs). The test consists of various tasks aimed at assessing their ability to generate new original words and phrases based on word formation processes (derivation and compounding) and on metaphorical language use. We administered the test to 24 humans and to an equal number of LLMs, and we automatically evaluated their answers using OCSAI tool for three criteria: Originality, Elaboration, and Flexibility. The results show that LLMs not only outperformed humans in all the assessed criteria, but did better in six out of the eight test tasks. We then computed the uniqueness of the individual answers, which showed some minor differences between humans and LLMs. Finally, we performed a short manual analysis of the dataset, which revealed that humans are more inclined towards E(extending)-creativity, while LLMs favor F(ixed)-creativity.

arXiv Open Access 2025
Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling

Ju-Chieh Chou, Jiawei Zhou, Karen Livescu

Textless spoken language models (SLMs) are generative models of speech that do not rely on text supervision. Most textless SLMs learn to predict the next semantic token, a discrete representation of linguistic content, and rely on a separate vocoder to add acoustic information to the generated speech. Such models have no access to acoustic context and no built-in control over acoustic details. In this work, we propose to jointly model linguistic and acoustic information by generating semantic tokens and a continuous real-valued representation of the acoustic frame. We use a flow-matching objective to predict the continuous vector conditioned on the semantic tokens. We study the design space of this approach and find that predicting multiple future semantic tokens helps preserve linguistic information. Our approach achieves comparable performance to existing models in terms of linguistic likelihood benchmarks, while providing better acoustic detail in prompted generation.

en cs.CL
arXiv Open Access 2025
Controlling Language Difficulty in Dialogues with Linguistic Features

Shuyao Xu, Wenguang Wang, Handong Gao et al.

Large language models (LLMs) have emerged as powerful tools for supporting second language acquisition, particularly in simulating interactive dialogues for speaking practice. However, adapting the language difficulty of LLM-generated responses to match learners' proficiency levels remains a challenge. This work addresses this issue by proposing a framework for controlling language proficiency in educational dialogue systems. Our approach leverages three categories of linguistic features, readability features (e.g., Flesch-Kincaid Grade Level), syntactic features (e.g., syntactic tree depth), and lexical features (e.g., simple word ratio), to quantify and regulate text complexity. We demonstrate that training LLMs on linguistically annotated dialogue data enables precise modulation of language proficiency, outperforming prompt-based methods in both flexibility and stability. To evaluate this, we introduce Dilaprix, a novel metric integrating the aforementioned features, which shows strong correlation with expert judgments of language difficulty. Empirical results reveal that our approach achieves superior controllability of language proficiency while maintaining high dialogue quality.

en cs.CL
S2 Open Access 2025
Understanding Negation in Zimbabwean Sign Language:

Tawanda Matende, Crous Hlungwani

This study explores negation in Zimbabwean Sign Language (ZSL), addressing the limited documentation of African sign languages. Using qualitative analysis of dialogues, discourse, and folklore, it examines how manual and non-manual features combine to express negation. Findings show ZSL employs manual signs (e.g., NO, NOT) paired with obligatory non-manual markers like headshakes, lowered eyebrows, and squinted eyes. Syntactic flexibility allows negation pre- or post-verbally, influenced by SOV word order, though headshakes remain obligatory across clauses. Unique ZSL features include bound negative morphemes (e.g., HAVE NOT) and formal-informal sign variants shaped by context and pragmatics. Comparative analysis reveals overlaps with ASL, SASL, and BSL but highlights ZSL’s distinctiveness, such as grammatical non-manual negation and facial reinforcement of inherently negative signs (e.g., WRONG). The study challenges generalizations from dominant sign language typologies, advocating for deeper investigation of understudied languages to refine cross-linguistic theories. By detailing ZSL’s negation system, the research contributes to linguistic typology, enhances understanding of ZSL grammar, and underscores the need for language-specific studies to prevent overgeneralization. It also emphasizes implications for deaf education and policy in Zimbabwe, urging pedagogical approaches aligned with ZSL’s structural nuances. This work underscores the necessity of recognizing regional sign languages’ unique features in both academic research and educational frameworks, promoting linguistic diversity and culturally responsive practices.

S2 Open Access 2025
A Conceptual Framework to Explain the Interface of Syntax-Semantics in Idiomatic Expressions

Muzaina Awni Saleem

This research proposes a conceptual framework for accounts of the syntax-semantics interface within idiomatic expressions, whose often non-compositional nature poses a substantial challenge to classical linguistic theory that is predicated on a meaning composition postulate based on word meaning and syntactic structure. But a significant challenge remains for linguistic theory: no single framework provides an explanation for how syntactic composition interacts with non-literal meaning in idiomatic phrasing, particularly since idioms vary in their compositionality. This gap in theory makes idioms hard to analyze and interpret across languages, in which syntactic stiffness tends to coexist with semantic obscurity or metaphorical richness. In probing the intricate relationship between syntax and meaning, the present research seeks to present a broad theoretical framework that brings together insights from both generative grammar, construction grammar, and cognitive linguistics. The framework is proposed as being able to cover the various gradations of compositionality among the different idioms, from fully opaque through to partially transparent ones. The research sheds light on the processing and interpretation of the idiomatic expressions across languages, pointing to the necessity of both syntactic structure and metaphorical meaning for idiom understanding.

S2 Open Access 2025
LINEARIZATION IN ADJECTIVAL PATTERNS: A CASE STUDY IN ENGLISH AND ARMENIAN

A. Hovhannisyan

Contrastive study of languages still remains one of the most vital aspects of comparative linguistics, as it reveals the worldview of entire nations. In lexicology, compounding represents a fascinating area of research. It is essential to investigate the theory of compounding in both English and Armenian, exploring various linguistic perspectives, providing relevant examples, and discussing the implications of contrasting these languages. This study aims to undertake a comparative analysis of adjectival compounds in English and Armenian, with a particular focus on the morphological patterns of adjectival compounding. The data obtained from this analysis reveal both isomorphic and distinct or allomorphic structural compositions across these contrasting languages. A thorough comparative study of this nature offers valuable insights into linguistic phenomena, and the outcomes of the analysis can highly contribute to several fields, including comparative grammar, contrastive lexicology, contrastive grammar and even translation studies. Moreover, proposed universal hypotheses concern the architecture of grammar, while accounting for language-specific characteristics.

DOAJ Open Access 2024
Translators’ and interpreters’ engagement with professional development in Australia: An analysis of key factors

Jim Hlavac, Shani Tobias, Lola Sundin et al.

Professional development aims to facilitate the maintenance, improvement and broadening of knowledge and skills, and has become a standard or even compulsory component of professional practice for many occupational groups. This paper traces the uptake of professional development amongst certified translators and interpreters in Australia, where in 2014 it was introduced as a requirement for newly-certified practitioners only, and in 2019 for all holders of translation or interpreting certification from the national certifying authority. Based on responses gained from a sample of 3,268 practitioners, we report high uptake overall with little variation according to level of qualifications. Slightly lower uptake rates are recorded only amongst ‘newcomers’ with less experience while for all others, it is consistently high. Lower uptake rates are recorded amongst those who work 1-10 hours per week and those earning up to A$10,000 per year compared to others working more hours and those earning more. A desire for more work does not co-occur with elevated levels of PD uptake. The data presented reflects the reported experiences of those who had already been required to engage with PD, those for whom this requirement was new with a three-year time window to undertake PD, as well as those for whom it still remains optional. These findings contribute to our understanding of PD uptake amongst a professional group whose engagement with post-certification training has been under-studied. Findings may inform relevant stakeholders in other countries considering measures to arrest atrophy and extend the skill sets of practising translators and interpreters.

Translating and interpreting
DOAJ Open Access 2024
Marc Angenot : La rhétorique à l’épreuve de l’histoire des idées

Marc Angenot, Marianne Doury, Théophile Robineau

Dans l’entretien qu’il a accordé à Marianne Doury et Théophile Robineau, Marc Angenot revient sur la place de la rhétorique dans ses travaux. Rappelant que l’ancienne rhétorique se fonde essentiellement sur le modèle judiciaire, il en souligne l’intérêt, mais aussi les limites, pour explorer le discours social dont il cherche à en rendre compte. Il en reprend la perspective globalisante, mobilisant ethos, pathos et logos, dont la prise en compte conjuguée est nécessaire à la compréhension des idées et de la façon dont elles sont portées et discutées dans la société. Mais il insiste sur la nécessaire prise en compte de l’inscription du discours social dans une histoire de plus ou moins long terme, condition à son intelligibilité. Le fait de se donner le discours social comme objet de recherche oblige également à reconsidérer la notion de situation telle que l’envisage traditionnellement la rhétorique, et à redéfinir le regard porté sur la question de la persuasion.

Style. Composition. Rhetoric
arXiv Open Access 2024
LMLPA: Language Model Linguistic Personality Assessment

Jingyao Zheng, Xian Wang, Simo Hosio et al.

Large Language Models (LLMs) are increasingly used in everyday life and research. One of the most common use cases is conversational interactions, enabled by the language generation capabilities of LLMs. Just as between two humans, a conversation between an LLM-powered entity and a human depends on the personality of the conversants. However, measuring the personality of a given LLM is currently a challenge. This paper introduces the Language Model Linguistic Personality Assessment (LMLPA), a system designed to evaluate the linguistic personalities of LLMs. Our system helps to understand LLMs' language generation capabilities by quantitatively assessing the distinct personality traits reflected in their linguistic outputs. Unlike traditional human-centric psychometrics, the LMLPA adapts a personality assessment questionnaire, specifically the Big Five Inventory, to align with the operational capabilities of LLMs, and also incorporates the findings from previous language-based personality measurement literature. To mitigate sensitivity to the order of options, our questionnaire is designed to be open-ended, resulting in textual answers. Thus, the AI rater is needed to transform ambiguous personality information from text responses into clear numerical indicators of personality traits. Utilising Principal Component Analysis and reliability validations, our findings demonstrate that LLMs possess distinct personality traits that can be effectively quantified by the LMLPA. This research contributes to Human-Computer Interaction and Human-Centered AI, providing a robust framework for future studies to refine AI personality assessments and expand their applications in multiple areas, including education and manufacturing.

en cs.CL, cs.AI
arXiv Open Access 2024
Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing

Tianchi Liu, Ivan Kukanov, Zihan Pan et al.

The effects of language mismatch impact speech anti-spoofing systems, while investigations and quantification of these effects remain limited. Existing anti-spoofing datasets are mainly in English, and the high cost of acquiring multilingual datasets hinders training language-independent models. We initiate this work by evaluating top-performing speech anti-spoofing systems that are trained on English data but tested on other languages, observing notable performance declines. We propose an innovative approach - Accent-based data expansion via TTS (ACCENT), which introduces diverse linguistic knowledge to monolingual-trained models, improving their cross-lingual capabilities. We conduct experiments on a large-scale dataset consisting of over 3 million samples, including 1.8 million training samples and nearly 1.2 million testing samples across 12 languages. The language mismatch effects are preliminarily quantified and remarkably reduced over 15% by applying the proposed ACCENT. This easily implementable method shows promise for multilingual and low-resource language scenarios.

en eess.AS, cs.AI
S2 Open Access 2023
Digital Tools in Teaching the Mass Media Language

L. Mialkovska, L. Zhvania, M. Rozhylо et al.

The functioning of language in modern media is a complex set of different types of discourses. It involves using mental and cultural codes, concepts and archetypes, taking into account the specifics of Internet content and methods of its promotion, along with traditional newspaper journalism, knowledge of the basics of cognitive, communicative and information-theoretical theories and methods, etc. The purpose of the academic paper is to clarify the features and modern tendencies of teaching the mass media language with the help of digital tools, as well as to establish particular practical aspects of using such educational means in the process of teaching the mass media language.  In the course of the research, the analytical-bibliographic method was used to study the scientific literature on teaching the mass media language with the help of digital tools. Along with this, induction, deduction, analysis, synthesis of information, system-structural, comparative, logical-linguistic methods, abstraction, and idealization were applied for studying and processing data. At the same time, the questionnaire survey was conducted in online mode by the research authors to practically clarify certain aspects of using digital educational tools in teaching the mass media language. Based on the research results, the primary and most significant theoretical aspects of the process of teaching the mass media language using digital educational tools were highlighted. Moreover, the standpoints of education seekers and teachers of higher educational institutions regarding the key aspects of this issue were clarified.

8 sitasi en
DOAJ Open Access 2023
Sartrean Ethics and Emotive Nuisance in Kafkaesque World

Muhammad Adnan Akbar, Maria Farooq Maan

This study investigates the integration of Sartrean ethical principles in Kafka's literary works and challenges the usefulness of existentialist ethics. Sartre's Notebook for An Ethics (1983) argues that existentialism is a practical ethical theory that challenges the separation of theoretical and practical aspects. Warnock echoes this in Existential Ethics (1967). By examining key works by Sartre, including Existentialism and Humanism (1946) and Being and Nothingness (1943), the research explores the fundamental concepts of Sartrean ethics, which include freedom, bad faith, responsibility, and anguish. Sartre rejects absolute values, prioritizing subjectivity while acknowledging authenticity and good faith. Although Kierkegaard and Heidegger do not explicitly address existential ethics, they contribute to ethical concerns. The study employs a qualitative d phenomenological approach, emphasizing Ricoeur's hermeneutic phenomenology. The theoretical framework is based on Sartre's Ethics and Emotive Nuisance concepts. The epistemological position aligns with Heidegger's interpretive technique. In the Kafkaesque World, characters struggle with existential perplexity amid modern-age horrors, exploring the traumas of existence. The research develops systematic frameworks to understand the ethical standpoint of this world, where characters face entanglement in chaotic existential paraphernalia and emotive nuisance. Emotions linked to existential ethics are examined to clarify the impact of emotion on ethical conduct.

English literature, Language. Linguistic theory. Comparative grammar
DOAJ Open Access 2023
Introduction special issue: marking the truth: a cross-linguistic approach to verum

Jordanoska Izabela, Kocher Anna, Bendezú-Araujo Raúl

This special issue focuses on the theoretical and empirical underpinnings of truth-marking. The names that have been used to refer to this phenomenon include, among others, counter-assertive focus, polar(ity) focus, verum focus, emphatic polarity or simply verum. This terminological variety is suggestive of the wide range of ideas and conceptions that characterizes this research field. This collection aims to get closer to the core of what truly constitutes verum. We want to expand the empirical base and determine the common and diverging properties of truth-marking in the languages of the world. The objective is to set a theoretical and empirical baseline for future research on verum and related phenomena.

Language. Linguistic theory. Comparative grammar

Halaman 13 dari 221604