Hasil "Computational linguistics. Natural language processing"

CrossRef Open Access 2025

AI Linguistics

Guosheng Zhang

en

DOAJ Open Access 2025

TIC, REFORME DES PROGRAMMES ET NOUVEAUX MODES DE FOURNITURE : UNE ETUDE DE CAS DE L’ECOLE DE CRIMINOLOGIE DE L’UNIVERSITE DE KINSHASA

Raoul KIENGE-KIENGE INTUDI et Patrick PIDIKA KIHANGU

Résumé : Cet article décrit l’expérience positive et courageuse de l’Ecole de criminologie de l’Université de Kinshasa, qui s’efforce à organiser des enseignements à distance en connectant avec succès enseignants et apprenants se trouvant dans des villes, des provinces voire des pays différents, grâce aux Technologies de l’information et de la communication (TIC), dans un contexte général marqué par des hésitations des établissements d’enseignement supérieur et universitaire à recourir à ces outils comme mode de fourniture. Les données ont été recueillies grâce à l’analyse documentaire et aux entretiens semi-dirigés avec des enseignants et apprenants impliqués dans ce programme de master en criminologie. Les résultats obtenus contribuent à développer la prise de conscience des intellectuels africains, chacun dans son domaine d’expertise et de spécialisation, sur la liberté intellectuelle et académique qu’ils possèdent, pour concevoir des programmes d’enseignement adaptés aux besoins locaux, tout en mobilisant les TIC pour développer un réseau d’universitaires travaillant sur les problématiques africaines. Mots-clés : Ecole de criminologie de l’Université de Kinshasa, TIC, TICE, criminologie, liberté intellectuelle et académique.

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2025

Automatic generation of ESL learning materials based on CEFR levels using reinforcement-tuned LLMs

Yi Zuo

Abstract The automatic generation of CEFR-aligned learning materials remains a challenging task due to the difficulty of balancing linguistic accuracy, scalability, and adaptability. Existing approaches, from rule-based templates to large language models, often fail to achieve strict CEFR alignment while maintaining readability across multi-paragraph content. To address this gap, we propose a reinforcement-learning-tuned LLM framework that integrates CEFR feature extraction, multi-objective reward shaping, and constrained decoding into a unified architecture. The framework enables dynamic adjustment of text complexity while ensuring level consistency. Experimental results show that our method improves CEFR-level classification accuracy by up to 12.3% at B2-C1 levels compared with state-of-the-art baselines, and reduces misalignment errors by 15.6%. Furthermore, attention visualization confirms that the policy network effectively focuses on complex syntactic structures during intermediate-level generation. These findings highlight not only the effectiveness of reinforcement learning in structured text generation but also the potential of constrained optimization as a scalable methodology for fine-grained linguistic control.

Computational linguistics. Natural language processing, Electronic computers. Computer science

Detail DOI Sumber

arXiv Open Access 2025

Linguists should learn to love speech-based deep learning models

Marianne de Heer Kloots, Paul Boersma, Willem Zuidema

Futrell and Mahowald present a useful framework bridging technology-oriented deep learning systems and explanation-oriented linguistic theories. Unfortunately, the target article's focus on generative text-based LLMs fundamentally limits fruitful interactions with linguistics, as many interesting questions on human language fall outside what is captured by written text. We argue that audio-based deep learning models can and should play a crucial role.

en cs.CL, cs.SD

Detail Sumber

CrossRef Open Access 2024

Preface to the Special Issue on Computational Linguistics and Natural Language Processing

Peter Z. Revesz

Computational linguistics and natural language processing are at the heart of the AI revolution that is currently transforming our lives [...]

en

Detail DOI Sumber

DOAJ Open Access 2024

Les conjonctions parce que, car et puisque : retour sur les analyses antérieures et proposition d’un critère « attentionnel »

Lidia Lebas-Fraczak

Most previous analyses of causal conjunctions parce que, car and puisque use an illocutionary criterion, opposing “explanation” and “justification”, and/or a presuppositional criterion. Other criteria are used to complete these analyses, such as the type of causality, the degree of subjectivity or the number of enunciation acts. We show through a review of the main contributions on the subject, that these various criteria have limits, giving rise to divergent analyses and failing to make a real functional distinction of the three morphemes. We hypothesize that such a distinction can be made by using a criterion of pragmatic and interlocutory nature, which can be considered as attentional, applying the notion of focusing.

Philology. Linguistics, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2024

Features of the Fantastic Novel in Ancient Arabic Narrative

Salima Bennour & Ayeb Fatma Zohra

Abstract: This study explores the origins and unique characteristics of the novel genre by tracing its roots to ancient Arabic storytelling traditions such, as fable, folklore and classic works like One Thousand and One Nights. The fantastical novel embraces elements of the supernatural, impossible and marvelous breaking away from realism in favor of storytelling. Early Arabic narratives featuring gods, jinns, magic and transformations sparked a sense of awe. Wonder, laying the groundwork for the emergence of fantastical Arabic literature. Influenced by climates and a desire to explore forms rooted in Arab Islamic heritage contemporary Arab writers have incorporated techniques, like irony, exaggeration, distortion, intertextuality and mystical themes into their works. The fantastical genre has provided a platform for authors to express dissenting views and suppressed desires through allegories. By examining these origins, we can shed light on the features and progression of this imaginative literary genre. Keywords : Arabic narrative, fantastic novel, supernatural, fable, folktales

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2024

Non-verbal predications in Zarma

Mahamane L. Abdoulaye, Salimata Abdoulrazikou

This article presents new findings in the use of copulas nôo ‘be’ and ti ‘be’ in non-verbal predications in Zarma (Songhay; Niger, Nigeria). Based on some exclusive contexts of use and some morphosyntactic criteria, the article distinguishes a basic type of predication with one term “NP + nôo” used in deictic identification (e. g.: Abdù nôo ‘it’s Abdu’) and a type of predication with two terms “NP1 + NP2 + nôo” used in nominal predications and equative sentences (e. g.: wodìn Abdù nôo ‘that is Abdu’). The article shows that copula ti replaces copula nôo in negation but also in non-verbal focus constructions where it is generally preceded by the subordinating conjunction kà/gà and very likely marks the presupposed part of the sentence (e. g.: [Muusà nôo] kà ti càwkŏo ‘[it’s Musa] who is a student’). Finally, the article shows that in Zarma, it is the one-term predication “NP + nôo” that is recruited to mark focus-fronted constituents of verbal and non-verbal predications, thus confirming an observation already made about other languages.

Computational linguistics. Natural language processing, Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

arXiv Open Access 2024

Model-based Large Language Model Customization as Service

Zhaomin Wu, Jizhou Guo, Junyi Hou et al.

Prominent Large Language Model (LLM) services from providers like OpenAI and Google excel at general tasks but often underperform on domain-specific applications. Current customization services for these LLMs typically require users to upload data for fine-tuning, posing significant privacy risks. While differentially private (DP) data synthesis presents a potential alternative, its application commonly results in low effectiveness due to the introduction of excessive noise on data for DP. To overcome this, we introduce Llamdex, a novel framework that facilitates LLM customization as a service, where the client uploads pre-trained domain-specific models rather than data. This client-uploaded model, optionally protected by DP with much lower noise, is inserted into the base LLM via connection modules. Significantly, these connecting modules are trained without requiring sensitive domain data, enabling clients to customize LLM services while preserving data privacy. Experiments demonstrate that Llamdex improves domain-specific accuracy by up to 26% over state-of-the-art private data synthesis methods under identical privacy constraints and, by obviating the need for users to provide domain context within queries, maintains inference efficiency comparable to the original LLM service.

en cs.LG, cs.AI

Detail Sumber

arXiv Open Access 2024

A Legal Framework for Natural Language Processing Model Training in Portugal

Rúben Almeida, Evelin Amorim

Recent advances in deep learning have promoted the advent of many computational systems capable of performing intelligent actions that, until then, were restricted to the human intellect. In the particular case of human languages, these advances allowed the introduction of applications like ChatGPT that are capable of generating coherent text without being explicitly programmed to do so. Instead, these models use large volumes of textual data to learn meaningful representations of human languages. Associated with these advances, concerns about copyright and data privacy infringements caused by these applications have emerged. Despite these concerns, the pace at which new natural language processing applications continued to be developed largely outperformed the introduction of new regulations. Today, communication barriers between legal experts and computer scientists motivate many unintentional legal infringements during the development of such applications. In this paper, a multidisciplinary team intends to bridge this communication gap and promote more compliant Portuguese NLP research by presenting a series of everyday NLP use cases, while highlighting the Portuguese legislation that may arise during its development.

en cs.CL, cs.ET

Detail Sumber

DOAJ Open Access 2023

Contribution à la connaissance de la fonction sujet en akwá, bantu C22 de la République du Congo

Guy-Roger Cyriac Gombé-Apondza

Résumé : La présente analyse étudie la fonction sujet en akwá, une langue bantu du nord de la République du Congo, en se fondant sur le modèle fonctionnaliste grâce auquel plusieurs langues sont étudiées à travers le monde. Son objectif est de définir cette notion en s’inspirant des travaux antérieurs menés par plusieurs chercheurs, à cet effet. De ces travaux, il ressort que le sujet est une notion est une notion très complexe et son exploitation demeure d’actualité. Mais, malgré ce fait, certains linguistes ont identifié les critères permettant de l’identifier. Ces derniers qui, pour la plupart, sont d’ordre distributionnel et fonctionnel, présentent le sujet comme le déterminant hiérarchique le plus important du prédicat. Il peut être assumé par plusieurs monèmes pouvant être les nominaux, les monèmes affixés, les numéraux et les syntagmes complétifs. Mots clés : fonction sujet, sujet syntaxique, akwá, bantu C22, République du Congo

Arts in general, Computational linguistics. Natural language processing

Detail Sumber

DOAJ Open Access 2023

Recursive recurrent neural network: A novel model for manipulator control with different levels of physical constraints

Zhan Li, Shuai Li

Abstract Manipulators actuate joints to let end effectors to perform precise path tracking tasks. Recurrent neural network which is described by dynamic models with parallel processing capability, is a powerful tool for kinematic control of manipulators. Due to physical limitations and actuation saturation of manipulator joints, the involvement of joint constraints for kinematic control of manipulators is essential and critical. However, current existing manipulator control methods based on recurrent neural networks mainly handle with limited levels of joint angular constraints, and to the best of our knowledge, methods for kinematic control of manipulators with higher order joint constraints based on recurrent neural networks are not yet reported. In this study, for the first time, a novel recursive recurrent network model is proposed to solve the kinematic control issue for manipulators with different levels of physical constraints, and the proposed recursive recurrent neural network can be formulated as a new manifold system to ensure control solution within all of the joint constraints in different orders. The theoretical analysis shows the stability and the purposed recursive recurrent neural network and its convergence to solution. Simulation results further demonstrate the effectiveness of the proposed method in end‐effector path tracking control under different levels of joint constraints based on the Kuka manipulator system. Comparisons with other methods such as the pseudoinverse‐based method and conventional recurrent neural network method substantiate the superiority of the proposed method.

Computational linguistics. Natural language processing, Computer software

Detail DOI Sumber

DOAJ Open Access 2023

Empirical Methods for the Study of Denotation in Nominalizations in Spanish

Aina Peris, Mariona Taulé, Horacio Rodríguez

Computational linguistics. Natural language processing

Detail DOI Sumber

arXiv Open Access 2023

Measuring Misogyny in Natural Language Generation: Preliminary Results from a Case Study on two Reddit Communities

Aaron J. Snoswell, Lucinda Nelson, Hao Xue et al.

Generic `toxicity' classifiers continue to be used for evaluating the potential for harm in natural language generation, despite mounting evidence of their shortcomings. We consider the challenge of measuring misogyny in natural language generation, and argue that generic `toxicity' classifiers are inadequate for this task. We use data from two well-characterised `Incel' communities on Reddit that differ primarily in their degrees of misogyny to construct a pair of training corpora which we use to fine-tune two language models. We show that an open source `toxicity' classifier is unable to distinguish meaningfully between generations from these models. We contrast this with a misogyny-specific lexicon recently proposed by feminist subject-matter experts, demonstrating that, despite the limitations of simple lexicon-based approaches, this shows promise as a benchmark to evaluate language models for misogyny, and that it is sensitive enough to reveal the known differences in these Reddit communities. Our preliminary findings highlight the limitations of a generic approach to evaluating harms, and further emphasise the need for careful benchmark design and selection in natural language evaluation.

en cs.CL, cs.CY

Detail Sumber

DOAJ Open Access 2022

OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction

Paola Velardi, Stefano Faralli, Roberto Navigli

Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2022

“Україна понад усе” чи “Пушкин – это наше всё”? (українська освіта та російська література)

Ю. Ковбасенко

У статті розглянуто три типи війни, гібридно нав’язані Росією Україні: «війну Ареса», «війну Афіни» та «війну Аполлона», а також причини, хід та результати використання культури й літератури як блискотливої вуалі для маскування імперської суті «русского мира», «кривой рожи России» (М. Гоголь). Проаналізовано, чому, попри широкомасштабну агресію, розв’язану 24.02.2022 Російською Федерацією проти України, а також доведені міжнародними судами воєнні злочини рашистів, у широких колах світової спільноти й навіть українського суспільства все ще зберігається пієтет до Росії та її «великої» літератури й культури. Зроблено висновок, що смертельно небезпечне (як «яблуко Білосніжки») поєднання, з одного боку, естетичної привабливості, та, з другого боку, імперської ідеологічної токсичності (надто в умовах повномасштабної військової агресії РФ, коли навіть сама російська мова, що нею написано згадані твори, для мільйонів українців стала тригером) робить російську літературу абсолютно неприйнятною для вивчення в ЗСО України. Простежено витоки й етапи закорінення міфу про «світову велич» російської літератури та зроблено обґрунтований висновок, що значна питома вага російських творів у наших шкільних програмах є не свідченням їхнього гаданого «світового» ідейно-естетичного рівня, а важкою спадщиною імперської (у т. ч. радянської) доби, коли в колонізованих Московією землях (зокрема й в Україні) відбувалася примусова асиміляція («обрусение») населення, тож усе російське насаджувалося силоміць. Спрогонозовано ефективні шляхи корекції стратегій вивчення російської літератури в ЗВО України: інтенсивне застосування постколоніальної інтерпретації та компаративного аналізу, оновлення кола досліджуваних літературних творів та застосування нових підходів до вивчення біографій письменників. Зазначено, що стратегічний поворот у викладанні російської літератури та культури в ЗВО України вимагатиме титанічних зусиль не лише освітян, а й усієї держави, розробки та реалізації спеціальної цільової державної програми. Ключові слова: «війна Аполлона», гібридна війна, глорифікація імперського літературного канону, імперський міф, національна ідентичність, постколоніальні студії, «рашизм», семантична (парадигмальна) війна, «трубадури Імперії».

Discourse analysis, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2022

La literatura camerunesa en lengua española: una base para la adquisición de las competencias socioculturales en la enseñanza del E/LE

Ibrahim ISSA

Resumen: Hoy en día observamos que la enseñanza del español como lengua extranjera en Camerún va más allá de la adquisición de los saberes. Es en este sentido que hay oportunidad de tomar en cuenta los aspectos socioculturales en relación con el medio ambiente de los aprendices para mejor alcanzar los objetivos del programa oficial. Uno de los objetivos es formar un ciudadano arraigado en su cultura y abierto a la de los demás. En este contexto, la literatura camerunesa en lengua española puede difundir las realidades socioculturales de Camerún y constituir una base para la adquisición de las competencias socioculturales. Este artículo ofrece un análisis de los contenidos socioculturales en la producción de los autores cameruneses en español y luego propone un método didáctico para su implementación.

Arts in general, Computational linguistics. Natural language processing

Detail Sumber

DOAJ Open Access 2022

Multilingual Expatriates in Poland and Their Attitudes Towards Learning Polish

Ewa Komorowska

The article presents partial results from an online survey conducted with a group of expatriates living and working in Poland. The main aim of the study was to examine the attitude of the target group towards learning Polish, and to find the most important motivations for learning it or the reasons for not doing so. This problem has not been thoroughly researched before because expatriates previously were not treated as a separate research group. The results show, however, that they are an interesting object of study; they differ from other types of learners (e.g. academic learners) mostly because of their different motivations and attitudes towards learning Polish. The language profile of the expatriates is particularly noteworthy, especially their multilingualism, although not all are learning Polish. Their motivations to start learning the language, and the reasons for stopping to do so or for not even starting at all despite living in Poland for long time, are also worth exploring.

Computational linguistics. Natural language processing, Semantics

Detail DOI Sumber

arXiv Open Access 2022

Privacy-Preserving Models for Legal Natural Language Processing

Ying Yin, Ivan Habernal

Pre-training large transformer models with in-domain data improves domain adaptation and helps gain performance on the domain-specific downstream tasks. However, sharing models pre-trained on potentially sensitive data is prone to adversarial privacy attacks. In this paper, we asked to which extent we can guarantee privacy of pre-training data and, at the same time, achieve better downstream performance on legal tasks without the need of additional labeled data. We extensively experiment with scalable self-supervised learning of transformer models under the formal paradigm of differential privacy and show that under specific training configurations we can improve downstream performance without sacrifying privacy protection for the in-domain data. Our main contribution is utilizing differential privacy for large-scale pre-training of transformer language models in the legal NLP domain, which, to the best of our knowledge, has not been addressed before.

en cs.CL

Detail DOI Sumber

DOAJ Open Access 2021

Title Index: Volume 28

Computational linguistics. Natural language processing

Detail DOI Sumber

Hasil untuk "Computational linguistics. Natural language processing"