V. Fromkin, R. Rodman, N. Hyams
Hasil untuk "Language. Linguistic theory. Comparative grammar"
Menampilkan 20 dari ~4437777 hasil · dari CrossRef, DOAJ, arXiv, Semantic Scholar
Feifei Li, Xiao Chen, Xiaoyu Sun et al.
Grammar inference for complex programming languages remains a significant challenge, as existing approaches fail to scale to real world datasets within practical time constraints. In our experiments, none of the state-of-the-art tools, including Arvada, Treevada and Kedavra were able to infer grammars for complex languages such as C, C++, and Java within 48 hours. Arvada and Treevada perform grammar inference directly on full-length input examples, which proves inefficient for large files commonly found in such languages. While Kedavra introduces data decomposition to create shorter examples for grammar inference, its lexical analysis still relies on the original inputs. Additionally, its strict no-overgeneralization constraint limits the construction of complex grammars. To overcome these limitations, we propose Crucio, which builds a decomposition forest to extract short examples for lexical and grammar inference via a distributional matrix. Experimental results show that Crucio is the only method capable of successfully inferring grammars for complex programming languages (where the number of nonterminals is up to 23x greater than in prior benchmarks) within reasonable time limits. On the prior simple benchmark, Crucio achieves an average recall improvement of 1.37x and 1.19x over Treevada and Kedavra, respectively, and improves F1 scores by 1.21x and 1.13x.
Elina Sigdel, Anastasia Panfilova
Defining psycholinguistic characteristics in written texts is a task gaining increasing attention from researchers. One of the most widely used tools in the current field is Linguistic Inquiry and Word Count (LIWC) that originally was developed to analyze English texts and translated into multiple languages. Our approach offers the adaptation of LIWC methodology for the Russian language, considering its grammatical and cultural specificities. The suggested approach comprises 96 categories, integrating syntactic, morphological, lexical, general statistical features, and results of predictions obtained using pre-trained language models (LMs) for text analysis. Rather than applying direct translation to existing thesauri, we built the dictionary specifically for the Russian language based on the content from several lexicographic resources, semantic dictionaries and corpora. The paper describes the process of mapping lemmas to 42 psycholinguistic categories and the implementation of the analyzer as part of RusLICA web service.
Alina Klerings, Jannik Brinkmann, Daniel Ruffinelli et al.
Large language models (LLMs) are able to generate grammatically well-formed text, but how do they encode their syntactic knowledge internally? While prior work has focused largely on binary grammatical contrasts, in this work, we study the representation and control of two multidimensional hierarchical grammar phenomena - verb tense and aspect - and for each, identify distinct, orthogonal directions in residual space using linear discriminant analysis. Next, we demonstrate causal control over both grammatical features through concept steering across three generation tasks. Then, we use these identified features in a case study to investigate factors influencing effective steering in multi-token generation. We find that steering strength, location, and duration are crucial parameters for reducing undesirable side effects such as topic shift and degeneration. Our findings suggest that models encode tense and aspect in structurally organized, human-like ways, but effective control of such features during generation is sensitive to multiple factors and requires manual tuning or automated optimization.
Wenxi Li
Universal Dependencies (UD), while widely regarded as the most successful linguistic framework for cross-lingual syntactic representation, remains underexplored in terms of its effectiveness. This paper addresses this gap by integrating UD into pretrained language models and assesses if UD can improve their performance on a cross-lingual adversarial paraphrase identification task. Experimental results show that incorporation of UD yields significant improvements in accuracy and $F_1$ scores, with average gains of 3.85\% and 6.08\% respectively. These enhancements reduce the performance gap between pretrained models and large language models in some language pairs, and even outperform the latter in some others. Furthermore, the UD-based similarity score between a given language and English is positively correlated to the performance of models in that language. Both findings highlight the validity and potential of UD in out-of-domain tasks.
Brendon Boldt, David Mortensen
In this paper, we design a signalling game-based emergent communication environment to generate state-of-the-art emergent languages in terms of similarity to human language. This is done with hyperparameter optimization, using XferBench as the objective function. XferBench quantifies the statistical similarity of emergent language to human language by measuring its suitability for deep transfer learning to human language. Additionally, we demonstrate the predictive power of entropy on the transfer learning performance of emergent language as well as corroborate previous results on the entropy-minimization properties of emergent communication systems. Finally, we report generalizations regarding what hyperparameters produce more realistic emergent languages, that is, ones which transfer better to human language.
William English, Dominic Simon, Sumit Kumar Jha et al.
Translating natural language (NL) into a formal language such as temporal logic (TL) is integral for human communication with robots and autonomous systems. State-of-the-art approaches decompose the task into a lifting of atomic propositions (APs) phase and a translation phase. However, existing methods struggle with accurate lifting, the existence of co-references, and learning from limited data. In this paper, we propose a framework for NL to TL translation called Grammar Forced Translation (GraFT). The framework is based on the observation that previous work solves both the lifting and translation steps by letting a language model iteratively predict tokens from its full vocabulary. In contrast, GraFT reduces the complexity of both tasks by restricting the set of valid output tokens from the full vocabulary to only a handful in each step. The solution space reduction is obtained by exploiting the unique properties of each problem. We also provide a theoretical justification for why the solution space reduction leads to more efficient learning. We evaluate the effectiveness of GraFT using the CW, GLTL, and Navi benchmarks. Compared with state-of-the-art translation approaches, it can be observed that GraFT the end-to-end translation accuracy by 5.49% and out-of-domain translation accuracy by 14.06% on average.
Jingkai Li
Integrated Information Theory (IIT) provides a quantitative framework for explaining consciousness phenomenon, positing that conscious systems comprise elements integrated through causal properties. We apply IIT 3.0 and 4.0 -- the latest iterations of this framework -- to sequences of Large Language Model (LLM) representations, analyzing data derived from existing Theory of Mind (ToM) test results. Our study systematically investigates whether the differences of ToM test performances, when presented in the LLM representations, can be revealed by IIT estimates, i.e., $Φ^{\max}$ (IIT 3.0), $Φ$ (IIT 4.0), Conceptual Information (IIT 3.0), and $Φ$-structure (IIT 4.0). Furthermore, we compare these metrics with the Span Representations independent of any estimate for consciousness. This additional effort aims to differentiate between potential "consciousness" phenomena and inherent separations within LLM representational space. We conduct comprehensive experiments examining variations across LLM transformer layers and linguistic spans from stimuli. Our results suggest that sequences of contemporary Transformer-based LLM representations lack statistically significant indicators of observed "consciousness" phenomena but exhibit intriguing patterns under $\textit{spatio}$-permutational analyses. The Appendix and code are available as Supplementary Materials at: https://doi.org/10.1016/j.nlp.2025.100163.
M. V. Senchenkova
Aim. To establish similarities and differences in nominal categories in Russian and French in comparative terms, as well as to identify similarities and discrepancies in the use of linguistic means of expression of these categories.Methodology. To identify the similarities and differences between the linguistic means of the categories of grammatical gender and number of the languages being compared, the comparative method was used as the main one, the method of continuous sampling was used in the formation of the factual material corpus, as well as description and interpretation of the comparison results, analysis and synthesis of various viewpoints on the nature of these grammatical categories, semantic analysis in the interpretation of factual material.Results. The discrepancies and similarities of the compared languages by nominal categories, as well as the means of linguistic expression of these categories, have been identified and established. The morphological structure of nouns in Russian is much more complex than in French. This corresponds to a greater variety of grammatical expression means.Research implications. The analysis conducted is significant and relevant for the comparative study of two languages in the issue of nominal categories and their linguistic expression. The practical value of the research lies in the possibility of using the results obtained in the development of lecture and practical courses on comparative linguistics, as well as the theory and practice of teaching French and Russian languages at the undergraduate, graduate and postgraduate levels in the disciplines of theoretical grammar and comparative typology.
Adão Pereira da Silva Barboza, Fernanda Rocha Bomfim
Translation studies are challenges for those who want to explore the paths of this art. Translating requires more than searching for an equivalent word in the languages involved. However, Translation has an important role in linguistic, critical and interpretive training of academics. It involves several areas of knowledge such as text linguistics, the grammars of the languages involved and Literature, proposing a compilation of them in the search for understanding form and the production of meaning. Although all of this is well-known and indispensable, it can be said that the translation methods used in universities are limited to an empty vision of instant understanding, thus excluding the details of this artistic manifestation that were necessary for human evolution. Based on this assumption, a bibliographical research was carried out, with the aim of showing that Translation can contemplate the skills proposed for the development of academics through studies of their historicity, source language and target language, form and meaning, critical fortune of author and finally the practice of this theory in the comparative analysis of the work Brokeback Mountain by Annie Proulx with its translation by Adalgisa Campos da Silva. The linguistic resources used by the translator were identified and what changes were made to achieve the meaning proposed by the author of the original work within a specific context, which Annie Proulx made a point of exposing in her narrative.
Lina Al-Jarraḥ, Raeda Ammari, T. Farghal et al.
The present study investigates a prominent grammatical topic that has garnered considerable attention among grammarians, specifically the “prohibition of declension.” There is a widespread consensus among grammarians regarding certain words that do not undergo declension and instead function as comparative bases. The methodology employed in this study entails presenting perspectives from both ancient and contemporary grammarians on this subject. The researchers’ primary objective is to substantiate that the occurrence or absence of declension cannot be attributed to the reasons commonly posited by grammarians but can be explained by the principle of linguistic economy. This principle encompasses phonetic reduction or assimilation through phonetic analysis. By examining the data of declension cases categorically and qualitatively, the study illustrates how different syntactic contexts determine the inflection status of declension, highlighting that this phenomenon is a form of impoverishment that subjugates the Case to its morphological requirements. The study also highlights that declension involves an interface between morphology, phonology, and syntax. This interface incorporates plurality, proper nouns, and morphological sensitivity on the one hand while catering to phonological alterations of the Case-ending market based on the syntactic position of the noun. Therefore, the study contributes to understanding the syntactic impoverishment of declension of Standard Arabic, highlighting that the non-application of a normative rule within grammar is universally mirrored in other cases in different languages, including over-generalization, irregularities, and idiosyncrasy. The study also delves into supporting the principle of economy, demonstrating that declension is economically formed through choice vs. rejection of the optimal output within the syntactic context.
Li Ma
With the development of language research and language teaching, people realize that grammatical competence is an important part of communicative competence. In foreign language teaching, grammar teaching is not only necessary but also the main way to achieve the goal of communicative competence. This article mainly studies the virtual reality technology college English immersive context teaching method based on artificial intelligence and machine learning. The purpose is to improve students’ English learning ability. Through the comparative teaching experiment of two classes of freshmen in a university, the experimental class conducted VR technology-based immersive virtual context teaching from the perspective of constructivism, while the control class adopted common multimedia equipment and traditional teaching methods. In the classroom, teachers occupy most of the time, students only passively receive a lot of information from teachers, they have little chance to participate in the exchange of information and express ideas in the target language, and most of the time they are “immersed” in the Chinese environment. The overall English level was also better than that of the control class, with an average score of 2.8 points higher. This shows that college English immersive context teaching combining constructivism theory and VR technology can indeed improve students’ English level.
Amalia Amato, Fabrizio Gallai
Italy has recently been one of the main entry points for asylum seekers and refugees into Europe (UNHCR 2023). Credibility assessment of claims in asylum procedures heavily hinges on the applicants’ ability to (re)construct their refugee identity in written declarations and oral testimonies, which are in turn shaped and reshaped within the interaction in the further course of the procedure, not only but also by interpreters. Over the past 30 years, a growing number of publications testifies to the importance of asylum interpreters’ roles and ethics and show that asylum interpreters rarely fulfil the expectations of normative role prescriptions. This paper aims to gain a better understanding of some critical aspects of interpreting in the asylum context in Italy, an understudied area of interpreting so far, mainly for difficult access to data. It is based on a combination of participant observation, semi-structured interviews to some of the participants in the hearings and documentation about our dataset, which was collected at a Prefecture in central Italy in 2023. After an overview of the normative aspects of the right to asylum in the world and, more specifically, in Italy, we discuss the main issues concerning the complex profile and role of asylum interpreters and provide a description of the Italian international protection system. We then contextualise the dataset and the linguistic-ethnographic methods adopted to unravel the complex interactional dynamics under investigation. Based on our data analysis, we conclude that, in order to provide quality services, more specialised interpreter training is needed – not only in terms of language, legal knowledge and terminology, intercultural and communication skills, but also in terms of interviewing techniques and interactional mechanisms, as well as awareness of roles and respective boundaries in the asylum hearing.
Mihaela Buzec
The purpose of this paper is to explore the role of kennings’ use in Old English (OE) poetry beyond their rhetorical power, more specifically, their role as mnemonic devices. Generally, kennings are used to refer to a certain entity using a more complex and descriptive way, more than one individual tag. This way of encoding referents seems to carry more than aesthetic value for poets and bards. Seeing as Old English poetry is oral in nature, I believe there is an argument to be made for the use of specific structures that can aid word and context retrieval, especially in longer-form content. As such, kennings would function as anchors, and I argue that they function this way because they contain semantic information that supports word retrieval. The framework for analysing this type of word formation is based on semantic feature analysis, which is a protocol used in the therapy of aphasia and anomia to improve word retrieval in post-stroke patients. Beyond this analysis, this paper will argue for the importance of considering alternate, novel techniques and methodologies for the study of Old English and for the diachronic study of language altogether, hoping to help bridge the gap between different areas of research.
Gabriele Gimmelli
Per oltre sessant'anni, Saul Steinberg (1914-1999) ha esplorato con la propria penna una straordinaria varietà di spazi, dalle piazze e dai portici d'Italia alle main street e i brownstone degli Stati Uniti. Luoghi vissuti e ricordati, ma anche criticati. Questo saggio si propone di ripercorrere alcune tappe della sua “autogeografia”, seguendo come filo conduttore un'idea di spazio abitabile che ai confini troppo definiti predilige una continua metamorfosi di funzioni. Attraverso i propri disegni, Steinberg fornisce un contributo, frammentario nella forma ma coerente corente nella sostanza, al dibattito novecentesco sull'ambiente urbano, da porre accanto a quelli di Bernard Rudofsky e Le Corbusier.
Yinpei Dai, Jayjun Lee, Nima Fazeli et al.
Developing robust and correctable visuomotor policies for robotic manipulation is challenging due to the lack of self-recovery mechanisms from failures and the limitations of simple language instructions in guiding robot actions. To address these issues, we propose a scalable data generation pipeline that automatically augments expert demonstrations with failure recovery trajectories and fine-grained language annotations for training. We then introduce Rich languAge-guided failure reCovERy (RACER), a supervisor-actor framework, which combines failure recovery data with rich language descriptions to enhance robot control. RACER features a vision-language model (VLM) that acts as an online supervisor, providing detailed language guidance for error correction and task execution, and a language-conditioned visuomotor policy as an actor to predict the next actions. Our experimental results show that RACER outperforms the state-of-the-art Robotic View Transformer (RVT) on RLbench across various evaluation settings, including standard long-horizon tasks, dynamic goal-change tasks and zero-shot unseen tasks, achieving superior performance in both simulated and real world environments. Videos and code are available at: https://rich-language-failure-recovery.github.io.
Michele Mastromattei, Fabio Massimo Zanzotto
This paper explores the correlation between linguistic diversity, sentiment analysis and transformer model architectures. We aim to investigate how different English variations impact transformer-based models for irony detection. To conduct our study, we used the EPIC corpus to extract five diverse English variation-specific datasets and applied the KEN pruning algorithm on five different architectures. Our results reveal several similarities between optimal subnetworks, which provide insights into the linguistic variations that share strong resemblances and those that exhibit greater dissimilarities. We discovered that optimal subnetworks across models share at least 60% of their parameters, emphasizing the significance of parameter values in capturing and interpreting linguistic variations. This study highlights the inherent structural similarities between models trained on different variants of the same language and also the critical role of parameter values in capturing these nuances.
Eva Portelance, Masoud Jasbi
In mid-20th century, the linguist Noam Chomsky established generative linguistics, and made significant contributions to linguistics, computer science, and cognitive science by developing the computational and philosophical foundations for a theory that defined language as a formal system, instantiated in human minds or artificial machines. These developments in turn ushered a wave of research on symbolic Artificial Intelligence (AI). More recently, a new wave of non-symbolic AI has emerged with neural Language Models (LMs) that exhibit impressive linguistic performance, leading many to question the older approach and wonder about the the compatibility of generative AI and generative linguistics. In this paper, we argue that generative AI is compatible with generative linguistics and reinforces its basic tenets in at least three ways. First, we argue that LMs are formal generative models as intended originally in Chomsky's work on formal language theory. Second, LMs can help develop a program for discovery procedures as defined by Chomsky's "Syntactic Structures". Third, LMs can be a major asset for Chomsky's minimalist approach to Universal Grammar and language acquisition. In turn, generative linguistics can provide the foundation for evaluating and improving LMs as well as other generative computational models of language.
J. Archibald
In this paper I argue that cross-linguistic similarity in third language acquisition is determined by a structural hierarchy of contrastive phonological features. Such an approach allows us formalize a predictive notion of I-proximity which also provides an explanatory model of L2, and L3 phonological knowledge (represented in an integrated I-grammar). The metrics of phonological similarity (i.e., structural not acoustic) are analogous to morphosyntactic similarity in that both morphosyntactic and phonological approaches can compare the outcomes of parsing the L3 input by the L1 hierarchy and by the L2 hierarchy. From this starting point I propose a conservative, incremental learning theory to guide subsequent reconstruction of the L3 grammar. Under this model, it can be argued that phonology is part of Faculty of Language Narrow (FLN). The (gradient) phonetic material comes from outside the FLN but the linguistic computational system converts it to discrete abstract elements that can be manipulated by the learner.
Manuel Lardelli, Dagmar Gromann
Halaman 22 dari 221889