Hasil "Language. Linguistic theory. Comparative grammar"

DOAJ Open Access 2026

Malina in Poland – reception and translation analysis of selected fragments of the Polish translation

Angelika Lis

This article analyses Malina by Ingeborg Bachmann, focusing on its structure, themes, reception and translation into Polish. The author discusses the challenges faced by translator Sławomir Błaut, emphasising his fidelity to the original, linguistic precision and ability to convey Bachmann’s poetic and fragmentary style. The text indicates that the difficult language, literary innovation and cultural differences influenced the niche reception of the novel in Poland, despite its high artistic and research value.

Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

DOAJ Open Access 2025

Nieprzekładalność jako mit: krytyczna rewizja pojęcia saudade

Gabriel Borowski

This article aims to discuss untranslatability as a myth, using the Portuguese concept of saudade as an example. The first part is dedicated to reconstructing the most significant moments in the development of the concept, with particular emphasis on reflections regarding its translatability into other languages. The second part proposes employing Barthes’ concept of myth in the reflection on untranslatability, which will be briefly illustrated with a contemporary example: the lyrics of the song representing Portugal at the 66th Eurovision Song Contest in Turin in 2022. The third part presents concluding remarks that also serve as a basis for further considerations on the issue of untranslatability, including its role in reinforcing unequal power relations between languages.

Translating and interpreting

Detail DOI Sumber

arXiv Open Access 2025

CUPE: Contextless Universal Phoneme Encoder for Language-Agnostic Speech Processing

Abdul Rehman, Jian-Jun Zhang, Xiaosong Yang

Universal phoneme recognition typically requires analyzing long speech segments and language-specific patterns. Many speech processing tasks require pure phoneme representations free from contextual influence, which motivated our development of CUPE - a lightweight model that captures key phoneme features in just 120 milliseconds, about one phoneme's length. CUPE processes short, fixed-width windows independently and, despite fewer parameters than current approaches, achieves competitive cross-lingual performance by learning fundamental acoustic patterns common to all languages. Our extensive evaluation through supervised and self-supervised training on diverse languages, including zero-shot tests on the UCLA Phonetic Corpus, demonstrates strong cross-lingual generalization and reveals that effective universal speech processing is possible through modeling basic acoustic patterns within phoneme-length windows.

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2025

Can structural correspondences ground real world representational content in Large Language Models?

Iwan Williams

Large Language Models (LLMs) such as GPT-4 produce compelling responses to a wide range of prompts. But their representational capacities are uncertain. Many LLMs have no direct contact with extra-linguistic reality: their inputs, outputs and training data consist solely of text, raising the questions (1) can LLMs represent anything and (2) if so, what? In this paper, I explore what it would take to answer these questions according to a structural-correspondence based account of representation, and make an initial survey of this evidence. I argue that the mere existence of structural correspondences between LLMs and worldly entities is insufficient to ground representation of those entities. However, if these structural correspondences play an appropriate role - they are exploited in a way that explains successful task performance - then they could ground real world contents. This requires overcoming a challenge: the text-boundedness of LLMs appears, on the face of it, to prevent them engaging in the right sorts of tasks.

en cs.CL, cs.AI

Detail DOI Sumber

DOAJ Open Access 2024

An examination of deictic terms in Tehmina's work My Feudal Lord

Jaweria Rehmat, Asma Khan, Komal Rafique

By applying Levinson's theoretical framework to the complicated domain of deictic phrases, this study examined and interpreted Tehmina Durrani's My Feudal Lord. The main linguistic component that serves as an anchor for conversation in specific spatiotemporal and social contexts was the deictic marker, which was the subject of this investigation. Knowing the different roles and effects of deictic markers in the novel's narrative setting was the goal of the study. The pragmatic principles governing the use of deictic phrases in communication were clarified by this study's close-textual analysis-based qualitative methodology, which drew on Levinson's pragmatic theory. The purpose of the study was to examine how character relationships, narrative viewpoint, and sociocultural nuances in the text are affected by deictic pronouns, demonstratives, temporal adverbs, and spatial expressions. The quantitative research looked at whether deictic phrases were used most and least frequently in the work. This investigation's fundamental idea was to apply Levinson's framework for interpretation in order to make sense of the practical implications of deictic utterances in "My Feudal Lord." This study thoroughly examined how deictic indicators interact with verbal acts, communicative implicatures, and beliefs in an effort to uncover the text's hidden meanings and its resonance in the socio-political context of modern-day Pakistan. Furthermore, the study underscored the significance of utilising Levinson's theoretical framework as a lens through which to perceive the author's narrative strategies and situated the analysis within the broader context of text interpretation and literary discourse. In the end, this study offered a thorough examination of the deictic phrases in "My Feudal Lord," improving our understanding of the pragmatic elements of language usage in literary works.

English literature, Language. Linguistic theory. Comparative grammar

Detail Sumber

DOAJ Open Access 2024

"Suspension of disbelief" vs. "Secondary Belief": fictional worlds in Coleridge and Tolkien

Paolo Pizzimento

This article aims to analyse S.T. Coleridge’s theory of suspension of disbelief and poetic faith, which seems to overshadow a conception of the literary work as displaying a “separate universe” capable of reconfiguring the experience of everyday reality. This theory, particularly through the mediation of Owen Barfield, exerts a considerable influence on J.R.R. Tolkien’s essay On Fairy-stories, which enters subtle controversy with Coleridge and opposes and opposes the suspension of disbelief with his “Secondary Belief”. The difference between the two authors can shed light on dissimilar conceptions of the ontological status of the fictional worlds.

Geography. Anthropology. Recreation, Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

DOAJ Open Access 2024

La política de la literatura en «Los Rembrandt de L´Hermitage» (1992)

Adianys González Herrera

Introducción: Se analiza el poemario Los Rembrandt de L´Hermitage (1992) de Fina García Marruz en tanto «práctica de la no pertenencia», propuesta de Florencia Garramuño (2015), específicamente la práctica del «imperio de los sentidos». Métodos: Se utilizó la metodología crítico-interpretativa del pensamiento de Florencia Garramuño, que tiene su base en el pensamiento de Jacques Rancière y se realiza un análisis de texto. Resultados: Se caracteriza cada poema para argumentar que el poemario se configura como imperio de los sentidos a través de la écfrasis. Conclusiones: El poemario reconfigura lo sensible a partir de la disolución adentro/afuera, contemplación/acción, divino/humano, extraordinario/cotidiano, mediante la exploración de las pinturas que ocasionan un sentir como sensación y como sentimiento.

Philology. Linguistics, Language. Linguistic theory. Comparative grammar

Detail Sumber

DOAJ Open Access 2024

Entwicklung mehrsprachiger Kompetenz im DaF-Unterricht durch korpusbasierte Lernaufgaben.

Antonella Catone, Daniela Sorrentino

The present work aims at showing the didactic potential of the novel Zweinhalb Störche: Roman einer Kindheit in Siebenbürgen, written in 2008 by the German-Romanian author Claudiu M. Florian, for multilingual learning in GFL classes. The paper focuses on the possible use of corpus-based learning tasks and consists of two parts: the first will introduce the main features of multilingual didactics and multilingual competence and the possible teaching approaches of inter- and transcultural literature through the use of corpora; the second will focus on the use of the novel to promote multilingual competence within the GFL classroom. In particular, specific tasks will be suggested to achieve a didactic surplus through the use of corpus compilation and analysis tools such as Sketch Engine.

Computational linguistics. Natural language processing, Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

arXiv Open Access 2024

Construction of a Japanese Financial Benchmark for Large Language Models

Masanori Hirano

With the recent development of large language models (LLMs), models that focus on certain domains and languages have been discussed for their necessity. There is also a growing need for benchmarks to evaluate the performance of current LLMs in each domain. Therefore, in this study, we constructed a benchmark comprising multiple tasks specific to the Japanese and financial domains and performed benchmark measurements on some models. Consequently, we confirmed that GPT-4 is currently outstanding, and that the constructed benchmarks function effectively. According to our analysis, our benchmark can differentiate benchmark scores among models in all performance ranges by combining tasks with different difficulties.

en q-fin.CP, cs.CL

Detail Sumber

arXiv Open Access 2024

Scaling up Multimodal Pre-training for Sign Language Understanding

Wengang Zhou, Weichao Zhao, Hezhen Hu et al.

Sign language serves as the primary meaning of communication for the deaf-mute community. Different from spoken language, it commonly conveys information by the collaboration of manual features, i.e., hand gestures and body movements, and non-manual features, i.e., facial expressions and mouth cues. To facilitate communication between the deaf-mute and hearing people, a series of sign language understanding (SLU) tasks have been studied in recent years, including isolated/continuous sign language recognition (ISLR/CSLR), gloss-free sign language translation (GF-SLT) and sign language retrieval (SL-RT). Sign language recognition and translation aims to understand the semantic meaning conveyed by sign languages from gloss-level and sentence-level, respectively. In contrast, SL-RT focuses on retrieving sign videos or corresponding texts from a closed-set under the query-by-example search paradigm. These tasks investigate sign language topics from diverse perspectives and raise challenges in learning effective representation of sign language videos. To advance the development of sign language understanding, exploring a generalized model that is applicable across various SLU tasks is a profound research direction.

en cs.CV, cs.MM

Detail Sumber

DOAJ Open Access 2023

Nie tylko Bulla gnieźnieńska. O znaczeniu dokumentu fundacyjnego Zbyluta (1153) dla polskich badań historycznojęzykowych

Marcin Kuźmicki, Tomasz Mika

NOT ONLY BULL OF GNIEZNO: ON THE IMPORTANCE OF ZBYLUT’S FOUNDATION DOCUMENT (1153) FOR POLISH HISTORICAL AND LINGUISTIC RESEARCH. PART 3: TOWARDS TRANSCRIPTION This article is the third part of a series dedicated to the documents sharing the name of “Zbylut’s foundation document”. The aim of the first part was to bring the three 12th-century documents into the field of interest of philologists and to review the historical literature with a view to identifying threads that may be relevant to philological (historical-linguistic) research, while the second part brought ananswer to the question of the relationship between the surviving documents, the degree of imitation between them, and the number of writers who were their authors. The present, third part, shows how complicated (and sometimes impossible) it is to get to the wording of the Zbylut document’s records and what procedures are used to do so. A comparison and analysis of selected records from the oldest three documents indicates the place of graphic substitution, the custom of redrawing records and the Latinization of records in the hierarchy of successive filters imposed on the record. We have also tried to show how making different assumptions leads to completely different readings of the same name and to different hypotheses about the presumed sound of its name. We have also discussed the consequences of “translating” a name written in simple graphics into the modern alphabet with which, according to the essence of transcription, ancient phonetic features were to be expressed, and which, aswe often get the impression, is no better suited for this than the graphics used by 12th-century scribes. We also took up the theme of the indispensable cooperation of historians and philologists in finding out the truth hidden in the oldest records of Polish proper names.

Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

DOAJ Open Access 2023

Stem configurations, lexical items, and phonological words in Maltese

Gilbert PUECH

Maltese is the 'national' language of the people in the sister-islands of Malta and Gozo. Originally Arabic, Maltese vocabulary has been massively expanded by borrowings from Romance (Sicilian/Italian), and more recently English. Influential contributions in (post)generative phonology and Optimality Theory claim that Maltese phonology is based on the interaction of (Palestinian Arabic type) stress assignment with syncope, and cyclic application of rules/constraints. This approach fails in some cases for Arabic Maltese, and is inoperative for borrowed vocabulary. I argue for distributing morphological constituents in three domains. The stem-domain heads a radical base, to which a preformative morph may be incorporated, and inflectional circumfixes. The Lexical-item domain heads the stem-node to which (in)direct pronominal objects may be concatenated. Clitics are adjoined to the phonological-word domain. All exponents are mapped on the linearized segmental tier. Trochaic stems satisfying morpho-lexical constraints are built in a pre-lexical phase. Vocalism is underlyingly specified or assigned by default, including ‘reverse-imāla’. In a lexical phase, OCP and Licensing shape syllabic stem-profiles to satisfy morpho-prosodic constraints. In the post-lexical phase, surface forms are generated by application of phonological processes: stress assignment, vowel length and quality, voice alteration in obstruents. A model of 'Weak CV Phonology', distantly related to Strict CV Phonology, is drawn up. Segmental representations are analyzed in monovalent elements. Maltese has often been presented as a 'mixed' language with two strata of vocabulary and two morphologies: root-and-pattern, non-concatenative for template-bound vocabulary, concatenative for loan-words. I claim that Maltese consistently requires templatic, concatenative and word-based morphology.

Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

DOAJ Open Access 2023

Examining a technology-focused language teacher community on Facebook during a crisis situation

Yurika Ito

Abstract Due to the chaos and confusion caused by the sudden transition from face-to-face teaching to online and remote teaching in early 2020, numerous language teachers had no choice but to rely on online communities on social networking sites. The current study therefore examined how some language teachers were utilising online communities on Facebook during the COVID-19 pandemic. Employing a mixed-methods approach, data were mainly collected through: (1) an eight-month observation of a technology-focused language teacher community on Facebook to identify different types of posts generated by its members before and during the COVID-19 pandemic (n = 340); (2) a questionnaire to understand the community members’ backgrounds and experiences of being in the community (n = 51); (3) semi-structured interviews with some of the questionnaire participants (n = 13); and (4) a post-interview questionnaire (n = 12) to get a better understanding of their responses. A content analysis of online posts and community members’ responses suggest that language teacher communities on Facebook were supporting teachers during the stressful periods of the pandemic professionally and emotionally. The main findings are discussed in terms of the benefits and drawbacks of using online language teacher communities for professional purposes. The overall goal of the study is to offer much-needed answers on how pre-existing communities can be used to assist language teachers in times of a crisis.

Special aspects of education, Language acquisition

Detail DOI Sumber

arXiv Open Access 2023

Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models

Jiaying Lu, Jinmeng Rao, Kezhen Chen et al.

Large Vision-Language Models (LVLMs) offer remarkable benefits for a variety of vision-language tasks. However, a challenge hindering their application in real-world scenarios, particularly regarding safety, robustness, and reliability, is their constrained semantic grounding ability, which pertains to connecting language to the physical-world entities or concepts referenced in images. Therefore, a crucial need arises for a comprehensive study to assess the semantic grounding ability of widely used LVLMs. Despite the significance, sufficient investigation in this direction is currently lacking. Our work bridges this gap by designing a pipeline for generating large-scale evaluation datasets covering fine-grained semantic information, such as color, number, material, etc., along with a thorough assessment of seven popular LVLMs' semantic grounding ability. Results highlight prevalent misgrounding across various aspects and degrees. To address this issue, we propose a data-centric enhancement method that aims to improve LVLMs' semantic grounding ability through multimodal instruction tuning on fine-grained conversations. Experiments on enhanced LVLMs demonstrate notable improvements in addressing misgrounding issues.

en cs.CV, cs.CL

Detail Sumber

arXiv Open Access 2023

Demystifying Instruction Mixing for Fine-tuning Large Language Models

Renxi Wang, Haonan Li, Minghao Wu et al.

Instruction tuning significantly enhances the performance of large language models (LLMs) across various tasks. However, the procedure to optimizing the mixing of instruction datasets for LLM fine-tuning is still poorly understood. This study categorizes instructions into three primary types: NLP downstream tasks, coding, and general chat. We explore the effects of instruction tuning on different combinations of datasets on LLM performance, and find that certain instruction types are more advantageous for specific applications but can negatively impact other areas. This work provides insights into instruction mixtures, laying the foundations for future research.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2023

Negated Complementary Commonsense using Large Language Models

Navid Rezaei, Marek Z. Reformat

Larger language models, such as GPT-3, have shown to be excellent in many tasks. However, we demonstrate that out-of-ordinary questions can throw the model off guard. This work focuses on finding answers to negated complementary questions in commonsense scenarios. We illustrate how such questions adversely affect the model responses. We propose a model-agnostic methodology to improve the performance in negated complementary scenarios. Our method outperforms few-shot generation from GPT-3 (by more than 11 points) and, more importantly, highlights the significance of studying the response of large language models in negated complementary questions. The code, data, and experiments are available under: https://github.com/navidre/negated_complementary_commonsense.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2022

Mitigating Covertly Unsafe Text within Natural Language Systems

Alex Mei, Anisha Kabir, Sharon Levy et al.

An increasingly prevalent problem for intelligent technologies is text safety, as uncontrolled systems may generate recommendations to their users that lead to injury or life-threatening consequences. However, the degree of explicitness of a generated statement that can cause physical harm varies. In this paper, we distinguish types of text that can lead to physical harm and establish one particularly underexplored category: covertly unsafe text. Then, we further break down this category with respect to the system's information and discuss solutions to mitigate the generation of text in each of these subcategories. Ultimately, our work defines the problem of covertly unsafe language that causes physical harm and argues that this subtle yet dangerous issue needs to be prioritized by stakeholders and regulators. We highlight mitigation strategies to inspire future researchers to tackle this challenging problem and help improve safety within smart systems.

en cs.AI, cs.CL

Detail Sumber

arXiv Open Access 2022

A generative grammar of cooking

Ganesh Bagler

Cooking is a uniquely human endeavor for transforming raw ingredients into delicious dishes. Over centuries, cultures worldwide have evolved diverse cooking practices ingrained in their culinary traditions. Recipes, thus, are cultural capsules that capture culinary knowledge in elaborate cooking protocols. While simple quantitative models have probed the patterns in recipe composition and the process of cuisine evolution, unlike other cultural quirks such as language, the principles of cooking remain hitherto unexplored. The fundamental rules that drive the act of cooking, shaping recipe composition and cuisine architecture, are unclear. Here we present a generative grammar of cooking that captures the underlying culinary logic. By studying an extensive repository of structured recipes, we identify core concepts and rules that together forge a combinatorial system for culinary synthesis. Building on the body of work done in the context of language, the demonstration of a logically consistent generative framework offers profound insights into the act of cooking. Given the central role of food in nutrition and lifestyle disorders, culinary grammar provides leverage to improve public health through dietary interventions beyond applications for creative pursuits such as novel recipe generation.

en physics.soc-ph, cs.AI

Detail Sumber

arXiv Open Access 2022

Toward More Meaningful Resources for Lower-resourced Languages

Constantine Lignos, Nolan Holley, Chester Palen-Michel et al.

In this position paper, we describe our perspective on how meaningful resources for lower-resourced languages should be developed in connection with the speakers of those languages. We first examine two massively multilingual resources in detail. We explore the contents of the names stored in Wikidata for a few lower-resourced languages and find that many of them are not in fact in the languages they claim to be and require non-trivial effort to correct. We discuss quality issues present in WikiAnn and evaluate whether it is a useful supplement to hand annotated data. We then discuss the importance of creating annotation for lower-resourced languages in a thoughtful and ethical way that includes the languages' speakers as part of the development process. We conclude with recommended guidelines for resource development.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2020

Selecting Informative Contexts Improves Language Model Finetuning

Richard Antonello, Nicole Beckage, Javier Turek et al.

Language model fine-tuning is essential for modern natural language processing, but is computationally expensive and time-consuming. Further, the effectiveness of fine-tuning is limited by the inclusion of training examples that negatively affect performance. Here we present a general fine-tuning method that we call information gain filtration for improving the overall training efficiency and final performance of language model fine-tuning. We define the information gain of an example as the improvement on a test metric after training on that example. A secondary learner is then trained to approximate this quantity. During fine-tuning, this learner selects informative examples and skips uninformative ones. We show that our method has consistent improvement across datasets, fine-tuning tasks, and language model architectures. For example, we achieve a median perplexity of 54.0 on a books dataset compared to 57.3 for standard fine-tuning. We present statistical evidence that offers insight into the improvements of our method over standard fine-tuning. The generality of our method leads us to propose a new paradigm for language model fine-tuning -- we encourage researchers to release pretrained secondary learners on common corpora to promote efficient and effective fine-tuning, thereby improving the performance and reducing the overall energy footprint of language model fine-tuning.

en cs.CL

Detail Sumber

Hasil untuk "Language. Linguistic theory. Comparative grammar"