Hasil "Language. Linguistic theory. Comparative grammar"

arXiv Open Access 2026

Computational modeling of early language learning from acoustic speech and audiovisual input without linguistic priors

Okko Räsänen

Learning to understand speech appears almost effortless for typically developing infants, yet from an information-processing perspective, acquiring a language from acoustic speech is an enormous challenge. This chapter reviews recent developments in using computational models to understand early language acquisition from speech and audiovisual input. The focus is on self-supervised and visually grounded models of perceptual learning. We show how these models are becoming increasingly powerful in learning various aspects of speech without strong linguistic priors, and how many features of early language development can be explained through a shared set of learning principles-principles broadly compatible with multiple theories of language acquisition and human cognition. We also discuss how modern learning simulations are gradually becoming more realistic, both in terms of input data and in linking model behavior to empirical findings on infant language development.

en cs.CL, cs.AI

Detail Sumber

DOAJ Open Access 2025

Development of Higher Order Thinking Skills (HOTS) Based Inferential Reading Module

Uyun Nafiah, Wahyuni Fitria, Ayuliamita Abadi

Nowadays, literacy is very important and critical thinking should be added in the teaching and learning. Therefore, this research is aimed at developing Higher Order Thinking Skills (HOTS) Based Inferential Reading Module which is suitable and practical for students of English Education Study Program. This was developmental research which used ADDIE model that consisted of five stages: analyze, design, develop, implement, and evaluate. Based on the result of validity test by two expert validators, it gained mean percentage 84.24% categorized as very valid in term of content eligibility, linguistics, presentation, and graphics components. Besides, based on the result of students’ practicality in terms of ease of use, efficiency of learning time and benefits, it was found percentage 81.13% categorized as very practical. It could be concluded that Higher Order Thinking Skills (HOTS) Based Inferential Reading Module is suitable, practical, and beneficial for the students of English Education Study Program UIN Sulthan Thaha Saifuddin Jambi.

Language. Linguistic theory. Comparative grammar, English language

Detail DOI Sumber

DOAJ Open Access 2025

Pragmatic and Grammatical Competence Interface in Second Language Acquisition: Conceptual Framework and Construction

Andi Rustandi, R. Bunga Febriani, Bambang Ruby Sugiarto

This study investigated the interface between pragmatic and grammatical competence in second language acquisition. This study aims to discover the conceptual framework of how pragmatics competence interfaces with grammatical competence. This study employs three research questions such as (1) how pragmatics competence and grammatical competence are interfaced with each other, (2) what kind of pragmatics competence should be learned by the learner of a second language, and (3) to what extent grammatical construction interface with the pragmatic domain. This study used the groundwork of the library research method, which forms the theoretical framework of this study by searching, reading, and evaluating some journals from the online journal that deals with the topic. The result revealed that pragmatic competence and grammatical competence are interfaced with each other since grammatical construction in the utterance contributes to the language use expression.

Language. Linguistic theory. Comparative grammar, Theory and practice of education

Detail DOI Sumber

DOAJ Open Access 2025

Publicist Mihael Kunić i njegov prinos razvoju hrvatske biografistike

Iva Mandušić

Publicist Mihael Kunić slabo je poznat hrvatskoj kulturnoj javnosti. Podrijetlom Slovak, zanimanjem umirovljeni pedagog i školski nadzornik u zemljama Habsburške Monarhije, u Hrvatskoj je boravio u posljednjem dijelu svoga života te je, aktivno sudjelujući u društvenom životu Zagreba, Karlovca, Varaždina, ali i u ladanjskom životu hrvatskoga plemstva, osobito u Slavoniji, zabilježio brojne zanimljivosti, danas vrijedne za poznavanje građanske kulture u prvim desetljećima XIX. st. Iz razdoblja njegova života prije dolaska u hrvatske zemlje potječu brojni njegovi radovi na njemačkom jeziku s područja pedagogije, lingvistike, povijesti i hortikulture, a slijedeći i razrađujući vlastitu biografsku metodu, Kunić je i u europskim razmjerima ostvario velik leksikografski pothvat, objavivši višesveščani biografski leksikon znamenitih ljudi Habsburške Monarhije Biographien merkwürdiger Männer der Österreichischen Monarchie (1805–12). U domaćoj literaturi uglavnom su poznate njegove povijesno-topografske studije i radovi o građanskim, plemićkim i javnim perivojima i parkovnoj arhitekturi, kojima je znatno pridonio poznavanju povijesti vrtne umjetnosti i kulturnih prilika u Hrvatskoj na početku XIX. st., a osim njih, u skladu s prosvjetiteljskim nastojanjima da se opišu životi osoba koje djeluju za opće dobro, objavio je i niz biografskih priloga o istaknutim osobama pretpreporodne Hrvatske i brojne prigodne pjesme kojima je obilježio razna javna i privatna događanja. U radu se, uz povijesni pregled razvoja hrvatske biografistike, razmatraju kriteriji odabira obrađenih osoba u Kunićevu opusu, motivi i okolnosti njegova djelovanja te njegov prinos razvoju ove discipline, s naglaskom na biografiju Josipa Sermagea, u kojoj opisuje svoje metodološke postupke, i krug osoba povezanih sa zagrebačkim Kaptolom kao dijela građanskoga sloja u nastajanju.

Lexicography

Detail Sumber

arXiv Open Access 2025

Classifying German Language Proficiency Levels Using Large Language Models

Elias-Leander Ahlers, Witold Brunsmann, Malte Schilling

Assessing language proficiency is essential for education, as it enables instruction tailored to learners needs. This paper investigates the use of Large Language Models (LLMs) for automatically classifying German texts according to the Common European Framework of Reference for Languages (CEFR) into different proficiency levels. To support robust training and evaluation, we construct a diverse dataset by combining multiple existing CEFR-annotated corpora with synthetic data. We then evaluate prompt-engineering strategies, fine-tuning of a LLaMA-3-8B-Instruct model and a probing-based approach that utilizes the internal neural state of the LLM for classification. Our results show a consistent performance improvement over prior methods, highlighting the potential of LLMs for reliable and scalable CEFR classification.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2025

Test-Time-Matching: Decouple Personality, Memory, and Linguistic Style in LLM-based Role-Playing Language Agent

Xiaoyu Zhan, Xinyu Fu, Hao Sun et al.

The rapid advancement of large language models (LLMs) has enabled role-playing language agents to demonstrate significant potential in various applications. However, relying solely on prompts and contextual inputs often proves insufficient for achieving deep immersion in specific roles, particularly well-known fictional or public figures. On the other hand, fine-tuning-based approaches face limitations due to the challenges associated with data collection and the computational resources required for training, thereby restricting their broader applicability. To address these issues, we propose Test-Time-Matching (TTM), a training-free role-playing framework through test-time scaling and context engineering. TTM uses LLM agents to automatically decouple a character's features into personality, memory, and linguistic style. Our framework involves a structured, three-stage generation pipeline that utilizes these features for controlled role-playing. It achieves high-fidelity role-playing performance, also enables seamless combinations across diverse linguistic styles and even variations in personality and memory. We evaluate our framework through human assessment, and the results demonstrate that our method achieves the outstanding performance in generating expressive and stylistically consistent character dialogues.

en cs.CL

Detail Sumber

arXiv Open Access 2025

Exploring Gender Bias in Large Language Models: An In-depth Dive into the German Language

Kristin Gnadt, David Thulke, Simone Kopeinik et al.

In recent years, various methods have been proposed to evaluate gender bias in large language models (LLMs). A key challenge lies in the transferability of bias measurement methods initially developed for the English language when applied to other languages. This work aims to contribute to this research strand by presenting five German datasets for gender bias evaluation in LLMs. The datasets are grounded in well-established concepts of gender bias and are accessible through multiple methodologies. Our findings, reported for eight multilingual LLM models, reveal unique challenges associated with gender bias in German, including the ambiguous interpretation of male occupational terms and the influence of seemingly neutral nouns on gender perception. This work contributes to the understanding of gender bias in LLMs across languages and underscores the necessity for tailored evaluation frameworks.

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2024

Application of GPT Language Models for Innovation in Activities in University Teaching

Manuel de Buenaga, Francisco Javier Bueno

The GPT (Generative Pre-trained Transformer) language models are an artificial intelligence and natural language processing technology that enables automatic text generation. There is a growing interest in applying GPT language models to university teaching in various dimensions. From the perspective of innovation in student and teacher activities, they can provide support in understanding and generating content, problem-solving, as well as personalization and test correction, among others. From the dimension of internationalization, the misuse of these models represents a global problem that requires taking a series of common measures in universities from different geographical areas. In several countries, there has been a review of assessment tools to ensure that work is done by students and not by AI. To this end, we have conducted a detailed experiment in a representative subject of Computer Science such as Software Engineering, which has focused on evaluating the use of ChatGPT as an assistant in theory activities, exercises, and laboratory practices, assessing its potential use as a support tool for both students and teachers.

en cs.CY, cs.AI

Detail Sumber

arXiv Open Access 2024

Design Proteins Using Large Language Models: Enhancements and Comparative Analyses

Kamyar Zeinalipour, Neda Jamshidi, Monica Bianchini et al.

Pre-trained LLMs have demonstrated substantial capabilities across a range of conventional natural language processing (NLP) tasks, such as summarization and entity recognition. In this paper, we explore the application of LLMs in the generation of high-quality protein sequences. Specifically, we adopt a suite of pre-trained LLMs, including Mistral-7B1, Llama-2-7B2, Llama-3-8B3, and gemma-7B4, to produce valid protein sequences. All of these models are publicly available.5 Unlike previous work in this field, our approach utilizes a relatively small dataset comprising 42,000 distinct human protein sequences. We retrain these models to process protein-related data, ensuring the generation of biologically feasible protein structures. Our findings demonstrate that even with limited data, the adapted models exhibit efficiency comparable to established protein-focused models such as ProGen varieties, ProtGPT2, and ProLLaMA, which were trained on millions of protein sequences. To validate and quantify the performance of our models, we conduct comparative analyses employing standard metrics such as pLDDT, RMSD, TM-score, and REU. Furthermore, we commit to making the trained versions of all four models publicly available, fostering greater transparency and collaboration in the field of computational biology.

en q-bio.QM, cs.AI

Detail Sumber

arXiv Open Access 2024

Bonding Grammars

Tikhon Pshenitsyn

We introduce bonding grammars, a graph grammar formalism developed to model DNA computation by means of graph transformations. It is a modification of fusion grammars introduced by Kreowski, Kuske and Lye in 2017. Bonding is a graph transformation that consists of merging two hyperedges into a single larger one. We show why bonding models interaction between DNA molecules better than fusion. Then, we investigate formal properties of this formalism. Firstly, we study the relation between bonding grammars and hyperedge replacement grammars proving that each of these kinds of grammars generates a language the other one cannot generate. Secondly, we prove that bonding grammars naturally generalise regular sticker systems. Finally, we prove that the membership problem for bonding grammars is NP-complete and, moreover, that some bonding grammar generates an NP-complete set.

en cs.FL

Detail DOI Sumber

arXiv Open Access 2024

Is Self-knowledge and Action Consistent or Not: Investigating Large Language Model's Personality

Yiming Ai, Zhiwei He, Ziyin Zhang et al.

In this study, we delve into the validity of conventional personality questionnaires in capturing the human-like personality traits of Large Language Models (LLMs). Our objective is to assess the congruence between the personality traits LLMs claim to possess and their demonstrated tendencies in real-world scenarios. By conducting an extensive examination of LLM outputs against observed human response patterns, we aim to understand the disjunction between self-knowledge and action in LLMs.

en cs.CL, cs.CY

Detail Sumber

arXiv Open Access 2024

From 'Showgirls' to 'Performers': Fine-tuning with Gender-inclusive Language for Bias Reduction in LLMs

Marion Bartl, Susan Leavy

Gender bias is not only prevalent in Large Language Models (LLMs) and their training data, but also firmly ingrained into the structural aspects of language itself. Therefore, adapting linguistic structures within LLM training data to promote gender-inclusivity can make gender representations within the model more inclusive. The focus of our work are gender-exclusive affixes in English, such as in 'show-girl' or 'man-cave', which can perpetuate gender stereotypes and binary conceptions of gender. We use an LLM training dataset to compile a catalogue of 692 gender-exclusive terms along with gender-neutral variants and from this, develop a gender-inclusive fine-tuning dataset, the 'Tiny Heap'. Fine-tuning three different LLMs with this dataset, we observe an overall reduction in gender-stereotyping tendencies across the models. Our approach provides a practical method for enhancing gender inclusivity in LLM training data and contributes to incorporating queer-feminist linguistic activism in bias mitigation research in NLP.

en cs.CL

Detail Sumber

arXiv Open Access 2023

Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition

Tzu-Ting Yang, Hsin-Wei Wang, Berlin Chen

In recent years, end-to-end speech recognition has emerged as a technology that integrates the acoustic, pronunciation dictionary, and language model components of the traditional Automatic Speech Recognition model. It is possible to achieve human-like recognition without the need to build a pronunciation dictionary in advance. However, due to the relative scarcity of training data on code-switching, the performance of ASR models tends to degrade drastically when encountering this phenomenon. Most past studies have simplified the learning complexity of the model by splitting the code-switching task into multiple tasks dealing with a single language and then learning the domain-specific knowledge of each language separately. Therefore, in this paper, we attempt to introduce language identification information into the middle layer of the ASR model's encoder. We aim to generate acoustic features that imply language distinctions in a more implicit way, reducing the model's confusion when dealing with language switching.

en cs.CL, cs.SD

Detail Sumber

DOAJ Open Access 2022

Estate, Capital and Province in the Alexander Potyomkin’s novel Man is canceled (2007)

Olga Bogdanova

The article analyzes the novel of the modern Russian writer A. Potyomkin Man is canceled (2007), which received a wide public response. The main idea of the work is the need for a radical change of the “mass man” of the turn of the XX-XXI centuries at the psychosomatic level. The ideological and compositional center is the specially built Rimushkino estate in the Oryol province, where the serf spirit of the Russian Empire at the turn of the XVIII-XIX centuries is reproduced. To answer the question of why the estate space of Russia is becoming the most representative field for anthropological experiments of the beginning of the XXI century, we consider the estate neo-myths of the Silver Age (the lost paradise on earth) and the Soviet period (the camp hell living in the mentality), as well as the imperial-colonial concept of the postmodern era (the estate as a frontier in the process of class-oriented internal colonization of the country). The multidimensional semiotics of the estate sets a new relationship between “metropolitan” and “provincial” concerning the other loci of the novel.

Language. Linguistic theory. Comparative grammar, Style. Composition. Rhetoric

Detail DOI Sumber

arXiv Open Access 2022

On the Generative Capacity of Contextual Grammars with Strictly Locally Testable Selection Languages

Jürgen Dassow, Bianca Truthe

We continue the research on the generative capacity of contextual grammars where contexts are adjoined around whole words (externally) or around subwords (internally) which belong to special regular selection languages. All languages generated by contextual grammars where all selection languages are elements of a certain subregular language family form again a language family. We investigate contextual grammars with strictly locally testable selection languages and compare those families to families which are based on finite, monoidal, nilpotent, combinational, definite, suffix-closed, ordered, commutative, circular, non-counting, power-separating, or union-free languages.

en cs.FL

Detail DOI Sumber

S2 Open Access 2021

The theme-recipient alternation in Chinese: tracking syntactic variation across seven centuries

Yi Li, Benedikt Szmrecsanyi, Weiwei Zhang

Abstract Previous research has tracked the history of the theme-recipient alternation (or: “dative” alternation) in Chinese, but few studies have embedded their analysis in a probabilistic variationist framework. Against this backdrop, we explore the language-internal and language-external factors that probabilistically influence the alternation between theme-first and recipient-first ordering in a large diachronic corpus of Chinese writing (1300s–1900s). Our analysis reveals that the recipient-first variant is consistently more frequent than its competitor and even more common in more recent texts than in older texts. Regression analysis also suggests that there are stable linguistic constraints (i.e., animacy and definiteness of theme) and fluid constraints (i.e., end-weight, recipient animacy). Notably, the diachronic instability of end-weight and animacy points to cross-linguistic parallels for ditransitive constructions, including the English dative alternation. We thus contribute to theory building in variationist linguistics by advancing the field’s knowledge about the comparative fluidity versus stability of probabilistic constraints.

15 sitasi en

Detail DOI Sumber

CrossRef Open Access 2021

Interpersonal Grammar

This pioneering volume lays out a set of methodological principles to guide the description of interpersonal grammar in different languages. It compares interpersonal systems and structures across a range of world languages, showing how discourse, interpersonal relationships between the speakers, and the purpose of their communication, all play a role in shaping the grammatical structures used in interaction. Following an introduction setting out these principles, each chapter focuses on a particular language - Khorchin Mongolian, Mandarin, Tagalog, Pitjantjatjara, Spanish, Brazilian Portuguese, British Sign Language and Scottish Gaelic – and explores mood, polarity, tagging, vocation, assessment and comment systems. The book provides a model for functional grammatical description that can be used to inform work on system and structure across languages as a foundation for functional language typology.

8 sitasi en

Detail DOI Sumber

DOAJ Open Access 2021

La co-costruzione dell’ambiente di apprendimento online. Etnografia di un quadrimestre in dad nella scuola secondaria di primo grado

Andreina Re

The co-construction of the online learning environment. Ethnography of first e-learning term in lower secondary school during the pandemic ------- Abstract --- This essay investigates how the online learning environment was built in Italian junior high school (age 11 to 14) during the first COVID-19 lockdown period (February-June 2020). Teachers and pupils found themselves projected into unknown virtual space, almost never explored at school before the surge of the pandemic. They had to engage in adventurous explorations of this new territory, conquering new spaces subduing the rigid structures of educational platforms and official regulatory codes.

Ethnology. Social and cultural anthropology, Language. Linguistic theory. Comparative grammar

Detail Sumber

arXiv Open Access 2021

Supporting Undotted Arabic with Pre-trained Language Models

Aviad Rom, Kfir Bar

We observe a recent behaviour on social media, in which users intentionally remove consonantal dots from Arabic letters, in order to bypass content-classification algorithms. Content classification is typically done by fine-tuning pre-trained language models, which have been recently employed by many natural-language-processing applications. In this work we study the effect of applying pre-trained Arabic language models on "undotted" Arabic texts. We suggest several ways of supporting undotted texts with pre-trained models, without additional training, and measure their performance on two Arabic natural-language-processing downstream tasks. The results are encouraging; in one of the tasks our method shows nearly perfect performance.

en cs.CL, cs.LG

Detail Sumber

DOAJ Open Access 2020

SOCRATIC QUESTIONS USED BY THE SEVENTH SEMESTER STUDENTS OF ENGLISH DEPARTMENT OF FKIP NOMMENSEN IN SEMINAR CLASS

Fenty Debora Napitupulu

This study is aimed to know to find out the types of Socratic questions used by the students of seventh semester in English Department of FKIP UHN Medan. This research was conducted by descriptive qualitative where the subject is seventh English Department students in Nommensen University academic year 2019/2020 on Seminar Class and the data is students’questions. After analizing questions, it was found that Socratic questions used by the seventh semester of English students in FKIP UHN Medan are questions and clarification, questions that prope purposes, questions that probe assumptions, Question that Probe Information, Reason, Evidence, and Cause, Question about viewpoints or perspectives, Questions that Probe Implication and Consequences, Question about question, Question that Probe Concept, and Questions that probe Inferences and Interpretation. The most dominant is Question that Probe Information, Reason, Evidence, and Cause. It means that the students’ ability in making questions in seminar on ELT presentation is still on the level of getting information from the text, less to have capacity to view or to judge things from some other perspectives, less of preparing themselves reading the seminar paper before the presentation starts. The writer assumed that the students lack of reading.

Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

Hasil untuk "Language. Linguistic theory. Comparative grammar"