Hasil "Computational linguistics. Natural language processing"

DOAJ Open Access 2026

LA NOTION DU TEMPS CHEZ LES SONGYE. UNE SEMANTIQUE MOTIVATIONNELLE SUR LES DENOMINATIONS (L 23)

Constantin KASENDWE MALALA

Résume : La temporalité est une notion si importante qui gouverne et dicte la conduite des peuples à travers le monde. Elle traduit le comportement de bon nombre des peuples dans leurs activités courantes. Nous posons dans cette analyse, les valeurs que véhiculent les appréhensions et acceptions des africains en général et des songye en particulier au sujet de la notion du temps. Face au vent de la mondialisation qui souffle, il est impérieux de fixer notre différence. Mots clés : Temps, Songye, Sémantique motivationnelle, Dénominations.

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2025

Zwischen Anspruch und Wirklichkeit: Die sinkende Nachfrage nach Deutsch als Fremdsprache in Dänemark und Norwegen

Karen Bauer, Beate Lindemann

This study investigates the situation of German language education in Denmark and Norway. The focus is on the causes of declining interest and the measures taken by educational authorities to promote German language skills. The research addresses the measures proposed and recommended by national educational authorities, how such measures are implemented in educational institutions, and how well these measures are known and applied by local experts, such as German teachers and employees in the field of German language education/teacher education at colleges and universities. This study employed an online survey methodology to examine German language education in Denmark and Norway. More than 400 teachers and 40 employees from universities and colleges responded to the survey. Qualitative content analysis was then applied to the resulting data. The findings reveal a clear discrepancy in the practical application of governmental documents within school and higher education settings. The study suggests provision of specific German language curricula, ongoing teacher training, and sustained funding for innovative language education practices. It emphasizes the need for practical implementation of government strategies and long-term support for educational initiatives.

Computational linguistics. Natural language processing, Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

DOAJ Open Access 2025

Integrated monitoring and prediction artificial intelligent based expert system: a case study on hydroponics strawberry cultivation

Marwa Hassan, Noha H. El-Amary, Daniele Alberoni et al.

Abstract This paper presents an integrated monitoring and prediction system for managing the C/N ratio in hydroponic strawberry cultivation, utilizing an artificial neural networks (ANN) and an adapted autoregressive integrated moving average (ARIMA) model. The ARIMA model has been improved by incorporating the error between the ANN predictions and the actual C/N ratio, thereby leading to higher predictive accuracy. The study leverages real-world data collected from a hydroponic strawberry farm, including environmental variables such as temperature, humidity, CO2 levels, pH level, moisture content, electrical conductivity, and nutrient uptake rates (Nitrogen, Phosphorus, Potassium, Calcium). The system accurately predicts the C/N ratio and provides timely alarms for deviations in nutrient levels and environmental conditions, ensuring optimal plant health and growth. Model performance is evaluated using k-fold cross-validation, resulting in significant reductions in root mean squared error (RMSE) and mean absolute error (MAE), and improvements in the coefficient of determination ( $$\hbox {R}^{2}$$ ). The adaptive alarm mechanism adjusts thresholds based on seasonal changes, enhancing control and responsiveness. This case study demonstrates the practical application of advanced modeling techniques in hydroponics, contributing to improved crop management and productivity, and paving the way for more sustainable agricultural practices.

Computational linguistics. Natural language processing, Electronic computers. Computer science

Detail DOI Sumber

DOAJ Open Access 2025

Feature pyramid attention network for audio‐visual scene classification

Liguang Zhou, Yuhongze Zhou, Xiaonan Qi et al.

Abstract Audio‐visual scene classification (AVSC) poses a formidable challenge owing to the intricate spatial‐temporal relationships exhibited by audio‐visual signals, coupled with the complex spatial patterns of objects and textures found in visual images. The focus of recent studies has predominantly revolved around extracting features from diverse neural network structures, inadvertently neglecting the acquisition of semantically meaningful regions and crucial components within audio‐visual data. The authors present a feature pyramid attention network (FPANet) for audio‐visual scene understanding, which extracts semantically significant characteristics from audio‐visual data. The authors’ approach builds multi‐scale hierarchical features of sound spectrograms and visual images using a feature pyramid representation and localises the semantically relevant regions with a feature pyramid attention module (FPAM). A dimension alignment (DA) strategy is employed to align feature maps from multiple layers, a pyramid spatial attention (PSA) to spatially locate essential regions, and a pyramid channel attention (PCA) to pinpoint significant temporal frames. Experiments on visual scene classification (VSC), audio scene classification (ASC), and AVSC tasks demonstrate that FPANet achieves performance on par with state‐of‐the‐art (SOTA) approaches, with a 95.9 F1‐score on the ADVANCE dataset and a relative improvement of 28.8%. Visualisation results show that FPANet can prioritise semantically meaningful areas in audio‐visual signals.

Computational linguistics. Natural language processing, Computer software

Detail DOI Sumber

DOAJ Open Access 2025

Enhancing blockchain-based audit data privacy via hybrid chaotic and RSA encryption: mechanism design and performance evaluation

Cheng Zhang

Abstract This paper focuses on the research of auditing data privacy protection mechanism under blockchain technology and constructs an efficient computational model. The model is based on the distributed ledger characteristic of blockchain, and ensures the data tampering and traceability by optimizing the consensus mechanism. In the proposed model, the consensus mechanism is optimized by utilizing the tamper proof properties of blockchain. By building a multi node collaborative framework that supports batch auditing, this model improves data synchronization efficiency. The focus of optimization is to reduce the consensus reaching time and thus improve the overall performance and scalability of the blockchain network. At the same time, smart contracts are utilized to realize automated data sharing and auditing processes to improve auditing efficiency. In terms of data encryption algorithm design, a chaotic system based on RSA algorithm encryption is designed by combining the randomness and complexity of chaos theory to further strengthen the security of data. The results show that the model in this paper can make the ciphertext image uncorrelated in all directions and improve the encryption strength of the image. The method of this paper can make the audit data information on the degree of privacy protection and the complexity of the ciphertext image increase up to about 50% and 45% than the comparison method; the encryption and decryption time of the data is reduced by about 20 s. In addition, the running time of the stages of this paper’s algorithm increases with the increase in the number of concurrent requests and this paper’s system can support concurrency of 500 users at the same time and make its throughput up to 583.49/s.

Computational linguistics. Natural language processing, Electronic computers. Computer science

Detail DOI Sumber

DOAJ Open Access 2025

A Lightweight YOLOv5 Target Detection Model and Its Application to the Measurement of 100‐Kernel Weight of Corn Seeds

Helong Yu, Jiayao Zhao, Chun Guang Bi et al.

ABSTRACT The 100‐kernel weight of corn seed is a crucial metric for assessing corn quality, and the current measurement means mostly involve manual counting of kernels followed by weighing on a balance, which is labour‐intensive and time‐consuming. Aiming to address the problem of low efficiency in measuring the 100‐kernel weight of corn seeds, this study proposes a measurement method based on deep learning and machine vision. In this study, high‐contrast camera technology was utilised to capture image data of corn seeds. And improvements were made to the feature extraction network of the YOLOv5 model by incorporating the MobileNetV3 network structure. The novel model employs deep separable convolution to decrease parameters and computational load. It incorporates a linear bottleneck and inverted residual structure to enhance efficiency. It introduces an SE attention mechanism for direct learning of channel number features and updates the activation function. Algorithms and experiments were subsequently designed to calculate the 100‐grain weight in conjunction with the output of the model. The outcomes revealed that the enhanced model in this study achieved an accuracy of 90.1%, a recall rate of 91.3%, and a mAP (mean average precision) value of 92.2%. While meeting production requirements, this model significantly reduces the number of parameters compared to alternative models—50% of the original model. In an applied study focused on measuring the 100‐kernel weight of corn seeds, the counting accuracy yielded a remarkable 97.18%, while the accuracy for weight measurement results reached 94.2%. This study achieves both efficient and precise measurement of the 100‐kernel weight of maize seeds, presenting a novel perspective in the exploration of maize seed weight.

Computational linguistics. Natural language processing, Computer software

Detail DOI Sumber

DOAJ Open Access 2025

Лексикографічне моделювання неосемантизмів у словниках української мови першої чверті ХХІ століття

Юлія Цигвінцева

Вступ. У сучасній українській мові спостерігаємо активне функціонування неосемантизмів – слів зі звичною формою, але новим значенням. Актуальність дослідження зумовлена потребою комплексного лексикографічного опрацювання таких одиниць. Мета розвідки – розкрити принципи та методологію моделювання неосемантизмів у словниках першої чверті XXI століття та презентувати власний електронний «Словник неосемантизмів». Методи. Для досягнення мети застосовано загальнонаукові методи індукції, дедукції, аналізу й синтезу, а також лінгвістичні методи: описовий, порівняльний, вибірковий та метод лексикографічного моделювання лексико-семантичних явищ. Результати. Оглянуто методи та прийоми лексикографування неосемантизмів у різних типах словників сучасної української мови: неологізмів, загальномовних тлумачних, індивідуально-авторських та мови творчих особистостей тощо, репрезентовано найпоказовіші приклади. Описано електронний «Словник неосемантизмів», створений на платформі «Lexonomy» у результаті узагальнення теоретичних напрацювань і практичних знахідок українських лексикографів. Схарактеризовано реєстр (730 лексем) та макроструктуру словника, що охоплює опис концепції, принципів його укладання та наповнення, список джерел ілюстративного матеріалу, перелік граматичних, стильових, галузевих та стилістичних ремарок. Особливу увагу зосереджено на мікроструктурі, яка моделює функціональний потенціал неосемантизма, - зонах реєстрового слова, граматичної інформації, ремарок, тлумачення, ілюстрації та паспортизації, стійких сполук, похідних одиниць, парадигматики та посилання на старе значення у давніших словниках. Проілюстровано зразки словникових статей для неосемантизмів різних частин мови. Висновки. Лексикографічне моделювання неосемантизмів залишається важливим та цікавим напрямом мовознавчих досліджень, адже мовна практика випереджає кодифікаційні процеси. Створений електронний ресурс сприяє фіксації неосемантизмів, дає змогу компактно й інформативно репрезентувати функціонування лексем з новими значеннями в мові та слугує корисним інструментом для аналізу мовної динаміки. Інформація про автора: Цигвінцева Юлія Олександрівна – доктор філософії, молодший науковий співробітник відділу лексикології, лексикографії та структурно-математичної лінгвістики Інституту української мови НАН України Електронна адреса: tsyhvintseva@nas.gov.ua

Language. Linguistic theory. Comparative grammar, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2025

OVALYTICS: Enhancing Offensive Video Detection with YouTube Transcriptions and Advanced Language Models

Sneha Chinivar, Roopa M.S., Arunalatha J.S. et al.

The exponential growth of offensive content online underscores the need for robust content moderation. In response, this work presents OVALYTICS (Offensive Video Analysis Leveraging YouTube Transcriptions with Intelligent Classification System), a comprehensive framework that introduces novel integrations of advanced technologies for offensive video detection. Unlike existing approaches, OVALYTICS uniquely combines Whisper AI for accurate audio-to-text transcription with state-of-the-art large language models (LLMs) such as BERT, ALBERT, XLM-R, MPNet, and T5 for semantic analysis. The framework also features a newly curated dataset tailored for fine-grained evaluation, achieving significant improvements in accuracy and F1-scores over traditional methods and advancing the state of automated content moderation.

Computational linguistics. Natural language processing

Detail DOI Sumber

arXiv Open Access 2025

Soft Inductive Bias Approach via Explicit Reasoning Perspectives in Inappropriate Utterance Detection Using Large Language Models

Ju-Young Kim, Ji-Hong Park, Se-Yeon Lee et al.

Recent incidents in certain online games and communities, where anonymity is guaranteed, show that unchecked inappropriate remarks frequently escalate into verbal abuse and even criminal behavior, raising significant social concerns. Consequently, there is a growing need for research on techniques that can detect inappropriate utterances within conversational texts to help build a safer communication environment. Although large-scale language models trained on Korean corpora and chain-of-thought reasoning have recently gained attention, research applying these approaches to inappropriate utterance detection remains limited. In this study, we propose a soft inductive bias approach that explicitly defines reasoning perspectives to guide the inference process, thereby promoting rational decision-making and preventing errors that may arise during reasoning. We fine-tune a Korean large language model using the proposed method and conduct both quantitative performance comparisons and qualitative evaluations across different training strategies. Experimental results show that the Kanana-1.5 model achieves an average accuracy of 87.0046, improving by approximately 3.89 percent over standard supervised learning. These findings indicate that the proposed method goes beyond simple knowledge imitation by large language models and enables more precise and consistent judgments through constrained reasoning perspectives, demonstrating its effectiveness for inappropriate utterance detection.

en cs.CL

Detail Sumber

arXiv Open Access 2025

A Computational Approach to Analyzing Language Change and Variation in the Constructed Language Toki Pona

Daniel Huang, Hyoun-A Joo

This study explores language change and variation in Toki Pona, a constructed language with approximately 120 core words. Taking a computational and corpus-based approach, the study examines features including fluid word classes and transitivity in order to examine (1) changes in preferences of content words for different syntactic positions over time and (2) variation in usage across different corpora. The results suggest that sociolinguistic factors influence Toki Pona in the same way as natural languages, and that even constructed linguistic systems naturally evolve as communities use them.

en cs.CL

Detail Sumber

arXiv Open Access 2025

GDLLM: A Global Distance-aware Modeling Approach Based on Large Language Models for Event Temporal Relation Extraction

Jie Zhao, Wanting Ning, Yuxiao Fei et al.

In Natural Language Processing(NLP), Event Temporal Relation Extraction (ETRE) is to recognize the temporal relations of two events. Prior studies have noted the importance of language models for ETRE. However, the restricted pre-trained knowledge of Small Language Models(SLMs) limits their capability to handle minority class relations in imbalanced classification datasets. For Large Language Models(LLMs), researchers adopt manually designed prompts or instructions, which may introduce extra noise, leading to interference with the model's judgment of the long-distance dependencies between events. To address these issues, we propose GDLLM, a Global Distance-aware modeling approach based on LLMs. We first present a distance-aware graph structure utilizing Graph Attention Network(GAT) to assist the LLMs in capturing long-distance dependency features. Additionally, we design a temporal feature learning paradigm based on soft inference to augment the identification of relations with a short-distance proximity band, which supplements the probabilistic information generated by LLMs into the multi-head attention mechanism. Since the global feature can be captured effectively, our framework substantially enhances the performance of minority relation classes and improves the overall learning ability. Experiments on two publicly available datasets, TB-Dense and MATRES, demonstrate that our approach achieves state-of-the-art (SOTA) performance.

en cs.CL, cs.IR

Detail Sumber

arXiv Open Access 2025

A conclusive remark on linguistic theorizing and language modeling

Cristiano Chesi

This is the final remark on the replies received to my target paper in the Italian Journal of Linguistics

en cs.CL

Detail DOI Sumber

CrossRef Open Access 2024

Research on the Comprehensive Evaluation Method of Topic Model

Huiying Yan, Yu Zhang

en

Detail DOI Sumber

DOAJ Open Access 2024

The Development of Semantic Processing in Children’s Emotional Advancement: An Analysis of the Movie “Inside Out”

Quratulain ., Tanzeel Ur Rehman, Bisma Butt

The study focuses on the development of semantic processing in children through the lens of the Disney animated movie “inside out” by Pete Docter and Ronnie Del Carmen. Semantic processing is a basic aspect of human knowledge of language processing that enables to extract meaning from the words and to comprehend the intended information conveyed by others. The movie “Inside Out” depicts the story of an 11-year-old girl named Riley and the personification of her emotion Joy, Sadness, Anger, Fear and Disgust as they navigate her life experiences. By analyzing the movie’s plot, characters and dialogues, the study investigates (1) how children’s semantic processing develops over time, (2) how it relates to the ability to understand abstract concepts and how it can impact children’s emotional development. (3) What role do metaphors and analogies play in the semantic development of children depicted in the movie. The goals of this study are to investigate the content, characters, and themes of the movie, to analyze the use of metaphors and analogies in the movie and to explore the relationship between the development of semantic processing and children's emotional advancement. The present study used mixed method to provide a comprehensive understanding of the phenomenon. The study targeted two different age groups i.e. 5-8 years and 9-12 years to calculate the impact of age on the development of semantic processing. The findings indicate that the movie “Inside Out” successfully engaged the children in reflecting on and succussing their understanding of emotions. Participants demonstrated and enhanced awareness of the importance and complexity of emotions. Also, it provides insight into how animated movies can be used as a tool to promote semantic development in children’s emotional advancement and inform the development of educational materials that can facilitate semantic processing in children.

Computational linguistics. Natural language processing, Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

arXiv Open Access 2024

Multi-word Tokenization for Sequence Compression

Leonidas Gee, Leonardo Rigutini, Marco Ernandes et al.

Large Language Models have proven highly successful at modelling a variety of tasks. However, this comes at a steep computational cost that hinders wider industrial uptake. In this paper, we present MWT: a Multi-Word Tokenizer that goes beyond word boundaries by representing frequent multi-word expressions as single tokens. MWTs produce a more compact and efficient tokenization that yields two benefits: (1) Increase in performance due to a greater coverage of input data given a fixed sequence length budget; (2) Faster and lighter inference due to the ability to reduce the sequence length with negligible drops in performance. Our results show that MWT is more robust across shorter sequence lengths, thus allowing for major speedups via early sequence truncation.

en cs.CL, cs.LG

Detail DOI Sumber

arXiv Open Access 2024

Tamil Language Computing: the Present and the Future

Kengatharaiyer Sarveswaran

This paper delves into the text processing aspects of Language Computing, which enables computers to understand, interpret, and generate human language. Focusing on tasks such as speech recognition, machine translation, sentiment analysis, text summarization, and language modelling, language computing integrates disciplines including linguistics, computer science, and cognitive psychology to create meaningful human-computer interactions. Recent advancements in deep learning have made computers more accessible and capable of independent learning and adaptation. In examining the landscape of language computing, the paper emphasises foundational work like encoding, where Tamil transitioned from ASCII to Unicode, enhancing digital communication. It discusses the development of computational resources, including raw data, dictionaries, glossaries, annotated data, and computational grammars, necessary for effective language processing. The challenges of linguistic annotation, the creation of treebanks, and the training of large language models are also covered, emphasising the need for high-quality, annotated data and advanced language models. The paper underscores the importance of building practical applications for languages like Tamil to address everyday communication needs, highlighting gaps in current technology. It calls for increased research collaboration, digitization of historical texts, and fostering digital usage to ensure the comprehensive development of Tamil language processing, ultimately enhancing global communication and access to digital services.

en cs.CL

Detail Sumber

DOAJ Open Access 2023

Parsing entire discourses as very long strings: Capturing topic continuity in grounded language learning

Minh-Thang Luong, Michael C. Frank, Mark Johnson

Computational linguistics. Natural language processing

Detail DOI Sumber

arXiv Open Access 2023

Can Chat GPT solve a Linguistics Exam?

Patricia Ronan, Gerold Schneider

The present study asks if ChatGPT4, the version of ChatGPT which uses the language model GPT4, can successfully solve introductory linguistic exams. Previous exam questions of an Introduction to Linguistics course at a German university are used to test this. The exam questions were fed into ChatGPT4 with only minimal preprocessing. The results show that the language model is very successful in the interpretation even of complex and nested tasks. It proved surprisingly successful in the task of broad phonetic transcription, but performed less well in the analysis of morphemes and phrases. In simple cases it performs sufficiently well, but rarer cases, particularly with missing one-to-one correspondence, are currently treated with mixed results. The model is not yet able to deal with visualisations, such as the analysis or generation of syntax trees. More extensive preprocessing, which translates these tasks into text data, allow the model to also solve these tasks successfully.

en cs.CL

Detail Sumber

DOAJ Open Access 2022

Discovering Lexical Similarity Using Articulatory Feature-Based Phonetic Edit Distance

Tafseer Ahmed, Muhammad Suffian, Muhammad Yaseen Khan et al.

Lexical Similarity (LS) between two languages uncovers many interesting linguistic insights such as phylogenetic relationship, mutual intelligibility, common etymology, and loan words. There are various methods through which LS is evaluated. This paper presents a method of Phonetic Edit Distance (PED) that uses a soft comparison of letters using the articulatory features associated with their International Phonetic Alphabet (IPA) transcription. In particular, the comparison between the articulatory features of two letters taken from words belonging to different languages is used to compute the cost of replacement in the inner loop of edit distance computation. As an example, PED gives edit distance of 0.82 between German word ‘vater’ ([fa:tər]) and Persian word ‘ <fig position="float" orientation="portrait"> <graphic position="float" orientation="portrait" xlink:href="khan6-3137905.eps"/> </fig>’ ([pedær]), meaning ‘father,’ and, similarly, PED of 0.93 between Hebrew word ‘ <fig position="float" orientation="portrait"> <graphic position="float" orientation="portrait" xlink:href="khan7-3137905.eps"/> </fig>’ ([ʃəɭam]) and Arabic word ‘ <fig position="float" orientation="portrait"> <graphic position="float" orientation="portrait" xlink:href="khan8-3137905.eps"/> </fig>’ ([səɭa:m], meaning ‘peace,’ whereas classical edit distances would be 4 and 2, respectively. We report the results of systematic experiments conducted on six languages: Arabic, Hindi, Marathi, Persian, Sanskrit, and Urdu. Universal Dependencies (UD) corpora were used to restrict the comparison to lists of words belonging to the same part of speech. The LS based on the average PED between pair of words was then computed for each pair of languages, unveiling similarities otherwise masked by the adoption of different alphabets, grammars, and pronunciations rules.

Electrical engineering. Electronics. Nuclear engineering

Detail DOI Sumber

arXiv Open Access 2022

BoAT v2 -- A Web-Based Dependency Annotation Tool with Focus on Agglutinative Languages

Salih Furkan Akkurt, Büşra Marşan, Susan Uskudarli

The value of quality treebanks is steadily increasing due to the crucial role they play in the development of natural language processing tools. The creation of such treebanks is enormously labor-intensive and time-consuming. Especially when the size of treebanks is considered, tools that support the annotation process are essential. Various annotation tools have been proposed, however, they are often not suitable for agglutinative languages such as Turkish. BoAT v1 was developed for annotating dependency relations and was subsequently used to create the manually annotated BOUN Treebank (UD_Turkish-BOUN). In this work, we report on the design and implementation of a dependency annotation tool BoAT v2 based on the experiences gained from the use of BoAT v1, which revealed several opportunities for improvement. BoAT v2 is a multi-user and web-based dependency annotation tool that is designed with a focus on the annotator user experience to yield valid annotations. The main objectives of the tool are to: (1) support creating valid and consistent annotations with increased speed, (2) significantly improve the user experience of the annotator, (3) support collaboration among annotators, and (4) provide an open-source and easily deployable web-based annotation tool with a flexible application programming interface (API) to benefit the scientific community. This paper discusses the requirements elicitation, design, and implementation of BoAT v2 along with examples.

en cs.CL

Detail Sumber

Hasil untuk "Computational linguistics. Natural language processing"