Hasil untuk "Language. Linguistic theory. Comparative grammar"

Menampilkan 20 dari ~4433242 hasil · dari DOAJ, CrossRef, arXiv

JSON API
DOAJ Open Access 2026
Charting Virtual Worlds: Training in Game Translation and Localization at a Brazilian University

Marileide Dias Esqueda, Igor Antônio Lourenço da Silva

The escalating global prominence of the game industry underscores the critical need for specialized translation and localization expertise. This article presents a case study of the Undergraduate Program in Translation at Universidade Federal de Uberlândia, a Brazilian public university that has been providing dedicated training in game localization for the last 15 years. The core of the Federal University of Uberlândia’s offering is a 60-hour course, that spans foundational localization concepts like transcreation and culturalization, as well as advanced technical skills, including computer-assisted translation, machine translation, and generative artificial intelligence. Its pedagogical approach integrates theoretical knowledge with practical application through lectures, workshops, and game localization projects, frequently leveraging open-source resources. Students gain experience by collaborating in teams, and emulating professional workflows. Further enhancing the training, students complete a senior thesis, often focusing on game localization, where they detail their experiences in technically adapting games. This research draws upon existing literature and employs an autoethnographic approach, critically reflecting on our experiences as trainers and thesis supervisors through personal archives, didactic materials, and learning tasks. Complementarily, a bibliometric and content analysis of past senior theses provides empirical data. A notable challenge addressed is the scarcity of game localization-specific teaching materials, prompting the program’s proactive development of resources tailored to the Brazilian market. This study thus aims to contribute to a specialized localization training.

Translating and interpreting
DOAJ Open Access 2026
Presuppositions in Descriptive Utterances on Kawula-Gusti of the Song Ingsun as Alternative Learning Materials for Javanese Language in Junior High Schools

Angelica Wahyu Kartika Budiarti, Sugeng Adipitoyo, Ahmad Rizky Wahyudi

Presupposition in descriptive utterances functions as an effective linguistic strategy for subtly instilling philosophical and theological assumptions, as manifested in the contemporary song "Ingsun" by Sujiwo Tejo. This study aims to examine the forms and functions of presupposition that construct the concept of Kawula-Gusti (Servant-God relationship) in the song's lyrics, and to analyze its relevance as Javanese language teaching material in Junior High Schools. This research employs a descriptive qualitative approach with data collection techniques utilizing listening and note-taking, based on the synthesis of presupposition theories by Stalnaker, Karttunen, and Yule, combined with Austin's locutionary acts and Keraf’s descriptive theory. The findings indicate that the lyrics are dominated by lexical and existential presuppositions which implicitly instill a profound understanding of Dununge (Position), Kuwasane (Authority), and Nuju Gambuhe (Union) of the Kawula-Gusti. The descriptive utterances require the listener's cognitive accommodation to accept theological truths as background facts without rigid indoctrination. These findings have strong pedagogical relevance for Javanese Language learning at the JHS Phase D level within the Merdeka Curriculum framework, particularly for training students' interpretive abilities toward implicit meaning and strengthening character based on the Pancasila Student Profile. However, acknowledging that the reliance on a single culturally and theologically dense text limits generalizability across diverse learner backgrounds, this study recommends extending the analytical framework to multiple Javanese texts of varying genres and difficulty levels to ensure broader applicability and instructional flexibility.

Philology. Linguistics, Language. Linguistic theory. Comparative grammar
DOAJ Open Access 2025
Zwischen Anspruch und Wirklichkeit: Die sinkende Nachfrage nach Deutsch als Fremdsprache in Dänemark und Norwegen

Karen Bauer, Beate Lindemann

This study investigates the situation of German language education in Denmark and Norway. The focus is on the causes of declining interest and the measures taken by educational authorities to promote German language skills. The research addresses the measures proposed and recommended by national educational authorities, how such measures are implemented in educational institutions, and how well these measures are known and applied by local experts, such as German teachers and employees in the field of German language education/teacher education at colleges and universities. This study employed an online survey methodology to examine German language education in Denmark and Norway. More than 400 teachers and 40 employees from universities and colleges responded to the survey. Qualitative content analysis was then applied to the resulting data. The findings reveal a clear discrepancy in the practical application of governmental documents within school and higher education settings. The study suggests provision of specific German language curricula, ongoing teacher training, and sustained funding for innovative language education practices. It emphasizes the need for practical implementation of government strategies and long-term support for educational initiatives.

Computational linguistics. Natural language processing, Language. Linguistic theory. Comparative grammar
DOAJ Open Access 2025
Bi-uniqueness violation in Old and Modern English personal pronouns: How and by which pronouns?

Alireza Mahmoodi

The aim of this study is to investigate and check the naturalness and markedness of Old and Modern English personal pronouns through the bi-uniqueness parameter, which is one of the parameters of natural morphology theory. The results showed that these two languages did not violate bi-uniqueness in first person pronouns, but in second and third person, violations are observed. In Modern English, the pronouns you and it and in Old English, the pronouns þē, inc, ēow, hī, hēo, him, hit and his violated bi-uniqueness and are unnatural and marked. It has also been observed that Old English violated bi-uniqueness more than Modern English.

Language. Linguistic theory. Comparative grammar
DOAJ Open Access 2025
Wisława Szymborska in Ost und West: Übersetzungen im Vergleich

Andrea Meyer-Fraatz

In both East and West Germany, Wisława Szymborska was discovered early and published in translations by various translators in numerous journals and anthologies. By 1990, the year of reunification, three volumes of her poems were published in West Germany, translated and edited by Karl Dedecius, whereas in East Germany only one book was published, edited and translated by Jutta Janke. This article offers an analysis of these publications in both German states, focussing on which poems were included in which publications and which poems were not published in either state. Finally, one of the poems, translated and published in both German states, “Dwie małpy Bruegla” [“Brueghel’s two monkeys”], is compared using the so-called Göttingen approach to translation research. This methodological approach assumes that the differences between the source text and the target text can provide indications of the conditions under which the respective translations were written, in order to find out to what extent the translations differ in the Federal Republic of Germany and the German Democratic Republic. Although individual poems might have been chosen for ideological reasons, the assumption that differences in translations of the same poem could be due to ideological factors cannot be confirmed in the case of this particular poem.

Translating and interpreting
arXiv Open Access 2025
A Comparative Evaluation of Large Language Models for Persian Sentiment Analysis and Emotion Detection in Social Media Texts

Kian Tohidi, Kia Dashtipour, Simone Rebora et al.

This study presents a comprehensive comparative evaluation of four state-of-the-art Large Language Models (LLMs)--Claude 3.7 Sonnet, DeepSeek-V3, Gemini 2.0 Flash, and GPT-4o--for sentiment analysis and emotion detection in Persian social media texts. Comparative analysis among LLMs has witnessed a significant rise in recent years, however, most of these analyses have been conducted on English language tasks, creating gaps in understanding cross-linguistic performance patterns. This research addresses these gaps through rigorous experimental design using balanced Persian datasets containing 900 texts for sentiment analysis (positive, negative, neutral) and 1,800 texts for emotion detection (anger, fear, happiness, hate, sadness, surprise). The main focus was to allow for a direct and fair comparison among different models, by using consistent prompts, uniform processing parameters, and by analyzing the performance metrics such as precision, recall, F1-scores, along with misclassification patterns. The results show that all models reach an acceptable level of performance, and a statistical comparison of the best three models indicates no significant differences among them. However, GPT-4o demonstrated a marginally higher raw accuracy value for both tasks, while Gemini 2.0 Flash proved to be the most cost-efficient. The findings indicate that the emotion detection task is more challenging for all models compared to the sentiment analysis task, and the misclassification patterns can represent some challenges in Persian language texts. These findings establish performance benchmarks for Persian NLP applications and offer practical guidance for model selection based on accuracy, efficiency, and cost considerations, while revealing cultural and linguistic challenges that require consideration in multilingual AI system deployment.

en cs.CL
arXiv Open Access 2025
Adding Alignment Control to Language Models

Wenhong Zhu, Weinan Zhang, Rui Wang

Post-training alignment has increasingly become a crucial factor in enhancing the usability of language models (LMs). However, the strength of alignment varies depending on individual preferences. This paper proposes a method to incorporate alignment control into a single model, referred to as CLM. This approach adds one identity layer preceding the initial layers and performs preference learning only on this layer to map unaligned input token embeddings into the aligned space. Experimental results demonstrate that this efficient fine-tuning method performs comparable to full fine-tuning. During inference, the input embeddings are processed through the aligned and unaligned layers, which are then merged through the interpolation coefficient. By controlling this parameter, the alignment exhibits a clear interpolation and extrapolation phenomenon.

en cs.CL
arXiv Open Access 2025
G2rammar: Bilingual Grammar Modeling for Enhanced Text-attributed Graph Learning

Heng Zheng, Haochen You, Zijun Liu et al.

Text-attributed graphs require models to effectively integrate both structural topology and semantic content. Recent approaches apply large language models to graphs by linearizing structures into token sequences through random walks. These methods create concise graph vocabularies to replace verbose natural language descriptions. However, they overlook a critical component that makes language expressive: grammar. In natural language, grammar assigns syntactic roles to words and defines their functions within sentences. Similarly, nodes in graphs play distinct structural roles as hubs, bridges, or peripheral members. Current graph language methods provide tokens without grammatical annotations to indicate these structural or semantic roles. This absence limits language models' ability to reason about graph topology effectively. We propose \textbf{G2rammar}, a bilingual grammar framework that explicitly encodes both structural and semantic grammar for text-attributed graphs. Structural grammar characterizes topological roles through centrality and neighborhood patterns. Semantic grammar captures content relationships through textual informativity. The framework implements two-stage learning with structural grammar pre-training followed by semantic grammar fine-tuning. Extensive experiments on real-world datasets demonstrate that G2rammar consistently outperforms competitive baselines by providing language models with the grammatical context needed to understand graph structures.

en cs.GR
arXiv Open Access 2025
Enhancing Small Language Models for Cross-Lingual Generalized Zero-Shot Classification with Soft Prompt Tuning

Fred Philippy, Siwen Guo, Cedric Lothritz et al.

In NLP, Zero-Shot Classification (ZSC) has become essential for enabling models to classify text into categories unseen during training, particularly in low-resource languages and domains where labeled data is scarce. While pretrained language models (PLMs) have shown promise in ZSC, they often rely on large training datasets or external knowledge, limiting their applicability in multilingual and low-resource scenarios. Recent approaches leveraging natural language prompts reduce the dependence on large training datasets but struggle to effectively incorporate available labeled data from related classification tasks, especially when these datasets originate from different languages or distributions. Moreover, existing prompt-based methods typically rely on manually crafted prompts in a specific language, limiting their adaptability and effectiveness in cross-lingual settings. To address these challenges, we introduce RoSPrompt, a lightweight and data-efficient approach for training soft prompts that enhance cross-lingual ZSC while ensuring robust generalization across data distribution shifts. RoSPrompt is designed for small multilingual PLMs, enabling them to leverage high-resource languages to improve performance in low-resource settings without requiring extensive fine-tuning or high computational costs. We evaluate our approach on multiple multilingual PLMs across datasets covering 106 languages, demonstrating strong cross-lingual transfer performance and robust generalization capabilities over unseen classes.

en cs.CL, cs.AI
arXiv Open Access 2025
The Illusionist's Prompt: Exposing the Factual Vulnerabilities of Large Language Models with Linguistic Nuances

Yining Wang, Yuquan Wang, Xi Li et al.

As Large Language Models (LLMs) continue to advance, they are increasingly relied upon as real-time sources of information by non-expert users. To ensure the factuality of the information they provide, much research has focused on mitigating hallucinations in LLM responses, but only in the context of formal user queries, rather than maliciously crafted ones. In this study, we introduce The Illusionist's Prompt, a novel hallucination attack that incorporates linguistic nuances into adversarial queries, challenging the factual accuracy of LLMs against five types of fact-enhancing strategies. Our attack automatically generates highly transferrable illusory prompts to induce internal factual errors, all while preserving user intent and semantics. Extensive experiments confirm the effectiveness of our attack in compromising black-box LLMs, including commercial APIs like GPT-4o and Gemini-2.0, even with various defensive mechanisms.

en cs.CL, cs.LG
DOAJ Open Access 2024
The impact of pronouns on the reception of supernatural creatures in Sapkowski’s short stories

Michalina Piotrowska

This paper seeks to analyse two translations of Andrzej Sapkowski’s books Ostatnie życzenie (The Last Wish) and Miecz przeznaczenia (Sword of Destiny) with the focus on pronouns used by the author and translators (Danusia Stok and David French) in relation to the supernatural creatures presented in the short stories. The text deals with the impact these pronouns can have on the perception of such creatures. The changes introduced in the translation of both texts and their possible consequences are discussed. The manipulation of certain terms may result in the humanization or dehumanization of supernatural creatures, which may – in turn – result in different effects the story has on the reader.

Language. Linguistic theory. Comparative grammar
arXiv Open Access 2024
From Effectiveness to Efficiency: Uncovering Linguistic Bias in Large Language Model-based Code Generation

Weipeng Jiang, Xuanqi Gao, Juan Zhai et al.

Large Language Models (LLMs) have demonstrated promising capabilities for code generation. While existing benchmarks evaluate the correctness and efficiency of LLM-generated code, the potential linguistic bias - where code quality varies based on the natural language used to describe programming tasks - remains underexplored. In this paper, we aim to investigate this linguistic bias through the lens of English and Chinese. To facilitate our investigation, we present a unified evaluation framework comprising a curated dataset of 52 Python programming questions with parallel bilingual task descriptions, automated correctness verification, and efficiency quantification tools based on runtime complexity estimation. Based on this framework, we conduct the first empirical study towards the linguistic bias in LLM-generated code on eight popular LCGMs, as well as GPT-3.5-Turbo and GPT-4. We observe that these LCGM-generated code show different correctness on an average of 12% bilingual programming tasks, where 39% also exhibits diverse efficiency. Our findings indicate that LLMs commonly exhibit linguistic bias for code generation.

en cs.SE, cs.PL
arXiv Open Access 2024
Self-Cognition in Large Language Models: An Exploratory Study

Dongping Chen, Jiawen Shi, Yao Wan et al.

While Large Language Models (LLMs) have achieved remarkable success across various applications, they also raise concerns regarding self-cognition. In this paper, we perform a pioneering study to explore self-cognition in LLMs. Specifically, we first construct a pool of self-cognition instruction prompts to evaluate where an LLM exhibits self-cognition and four well-designed principles to quantify LLMs' self-cognition. Our study reveals that 4 of the 48 models on Chatbot Arena--specifically Command R, Claude3-Opus, Llama-3-70b-Instruct, and Reka-core--demonstrate some level of detectable self-cognition. We observe a positive correlation between model size, training data quality, and self-cognition level. Additionally, we also explore the utility and trustworthiness of LLM in the self-cognition state, revealing that the self-cognition state enhances some specific tasks such as creative writing and exaggeration. We believe that our work can serve as an inspiration for further research to study the self-cognition in LLMs.

en cs.CL, cs.AI
arXiv Open Access 2024
Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models

Adam Karvonen

Language models have shown unprecedented capabilities, sparking debate over the source of their performance. Is it merely the outcome of learning syntactic patterns and surface level statistics, or do they extract semantics and a world model from the text? Prior work by Li et al. investigated this by training a GPT model on synthetic, randomly generated Othello games and found that the model learned an internal representation of the board state. We extend this work into the more complex domain of chess, training on real games and investigating our model's internal representations using linear probes and contrastive activations. The model is given no a priori knowledge of the game and is solely trained on next character prediction, yet we find evidence of internal representations of board state. We validate these internal representations by using them to make interventions on the model's activations and edit its internal board state. Unlike Li et al's prior synthetic dataset approach, our analysis finds that the model also learns to estimate latent variables like player skill to better predict the next character. We derive a player skill vector and add it to the model, improving the model's win rate by up to 2.6 times.

en cs.LG, cs.CL
arXiv Open Access 2024
Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora

Erik Derner, Sara Sansalvador de la Fuente, Yoan Gutiérrez et al.

Large language models (LLMs) often inherit and amplify social biases embedded in their training data. A prominent social bias is gender bias. In this regard, prior work has mainly focused on gender stereotyping bias - the association of specific roles or traits with a particular gender - in English and on evaluating gender bias in model embeddings or generated outputs. In contrast, gender representation bias - the unequal frequency of references to individuals of different genders - in the training corpora has received less attention. Yet such imbalances in the training data constitute an upstream source of bias that can propagate and intensify throughout the entire model lifecycle. To fill this gap, we propose a novel LLM-based method to detect and quantify gender representation bias in LLM training data in gendered languages, where grammatical gender challenges the applicability of methods developed for English. By leveraging the LLMs' contextual understanding, our approach automatically identifies and classifies person-referencing words in gendered language corpora. Applied to four Spanish-English benchmarks and five Valencian corpora, our method reveals substantial male-dominant imbalances. We show that such biases in training data affect model outputs, but can surprisingly be mitigated leveraging small-scale training on datasets that are biased towards the opposite gender. Our findings highlight the need for corpus-level gender bias analysis in multilingual NLP. We make our code and data publicly available.

en cs.CL, cs.CY
arXiv Open Access 2024
Testing MediaPipe Holistic for Linguistic Analysis of Nonmanual Markers in Sign Languages

Anna Kuznetsova, Vadim Kimmelman

Advances in Deep Learning have made possible reliable landmark tracking of human bodies and faces that can be used for a variety of tasks. We test a recent Computer Vision solution, MediaPipe Holistic (MPH), to find out if its tracking of the facial features is reliable enough for a linguistic analysis of data from sign languages, and compare it to an older solution (OpenFace, OF). We use an existing data set of sentences in Kazakh-Russian Sign Language and a newly created small data set of videos with head tilts and eyebrow movements. We find that MPH does not perform well enough for linguistic analysis of eyebrow movement - but in a different way from OF, which is also performing poorly without correction. We reiterate a previous proposal to train additional correction models to overcome these limitations.

en cs.CV
arXiv Open Access 2024
Morphology and Syntax of the Tamil Language

Kengatharaiyer Sarveswaran

This paper provides an overview of the morphology and syntax of the Tamil language, focusing on its contemporary usage. The paper also highlights the complexity and richness of Tamil in terms of its morphological and syntactic features, which will be useful for linguists analysing the language and conducting comparative studies. In addition, the paper will be useful for those developing computational resources for the Tamil language. It is proven as a rule-based morphological analyser cum generator and a computational grammar for Tamil have already been developed based on this paper. To enhance accessibility for a broader audience, the analysis is conducted without relying on any specific grammatical formalism.

en cs.CL
DOAJ Open Access 2023
Local Aspects of Feminism in “Jangloos”: An Analytical Study

Waseem Abbas, Dr. Aziz Ibn ul Hassan

Shaukat Siddiqui is well known progressive short story writer and novelist in Urdu Literature. His novel "Jangloos" completely portrays the Pakistani society. The following research is an attempt to analyze novel "Jangloos" through a feminist lens, but this critical approach has been localized at length. Therefore, the analysis of "Jangloos" will be carried out by using local feminist approaches. The research will be focusing on the issues like exploitation, marginalization, and oppression that women face in "Jangloos". The research therefore is not only a textual analysis of but also a contextual and cultural study of the novel. In the context of this novel, an attempt has been made to explore the role of women in agricultural production and the venerability of women according to rural customs.

Language. Linguistic theory. Comparative grammar, Computational linguistics. Natural language processing
DOAJ Open Access 2023
Factors affecting the quality and effectiveness of student teachers during their practicum experiences: the case of some selected colleges in Oromia, Ethiopia

Hika Negash Galana, Adinew Tadesse Degago, Alemayehu Getachew Tsegaye et al.

Abstract In this study, an attempt was made to investigate the constraints encountered by student teachers during their practicum experiences at some selected colleges in Oromia, Ethiopia. Adopting a convergent mixed research design, a questionnaire was distributed to student teachers, and a semi-structured interview was conducted with supervisors and mentors. The data found from questionnaire were analyzed using a descriptive statistics, inferential statistics and One-Way ANOVA. Besides, the interview results were analyzed using content analysis method. In the findings, factors such as mentors’ lack of continuous follow up and support, interest to share experience, and friendliness were identified. In addition, follow up and support were not continuously provided by supervisors, and there was no coordination between supervisors and mentors. Further, Colleges engage large numbers of candidates to one school, allot many student teachers for one academic supervisor, opportunity given for practice was inadequate and there was lack of necessary facilities in the cooperating schools. Hence, it can be concluded that there were limitations from the side of mentors, supervisors, colleges and cooperating schools on playing their roles in teaching practice. Therefore, based on the findings of the study and the drawn conclusions; mentors and supervisors of the practicum should make continuous follow up and provide immediate feedback for their student teachers. In addition, they should collaborate while evaluating and equipping their student teachers with all necessary things. Besides, colleges should have good rapport with cooperating schools, try to fulfill necessary facilities, and strengthen to make them effectively produce qualified students. They should work on how to mitigate the number of student teachers with the number of supervisors and schools. Finally, cooperating schools should learn from spontaneous limitations and go further to fulfill their needs.

Special aspects of education, Language acquisition

Halaman 47 dari 221663