Hasil "Computational linguistics. Natural language processing"

DOAJ Open Access 2025

Artificial intelligence-driven personalized space design and implementation in the aging-friendly renovation of smart home

Dan Jiang

Abstract Against the backdrop of the global rapid transition to an aging society, how to utilize smart home technology to achieve efficient and personalized aging-friendly transformation has become a common focus in academic and practical fields. This study proposes a space generation and optimization framework integrated with Artificial Intelligence (AI), aiming to create a dynamically adaptive living environment for elderly users in a data-driven manner. First, the study deploys a multi-type sensor system in typical elderly households to collect multi-dimensional data, including behavior trajectories, activity intensity, and spatial usage heat of space usage. Subsequently, it uses Long Short-Term Memory (LSTM) network to model time-series behaviors, extract daily activity patterns, and establish a mapping relationship between behaviors and needs based on these patterns. On this basis, a Reinforcement Learning (RL) mechanism is introduced. By constructing a reward function centered on safety, convenience, and individual preferences, the spatial layout is continuously optimized through iteration. Furthermore, Conditional Generative Adversarial Network (cGAN) is used to generate spatial sketch designs, which significantly improves the efficiency of interaction and visualization. The study conducts empirical verification in three elderly household samples. Comparative experimental results show that: The coverage rate of activity areas increases to 71.6%, the spatial idle rate decreases to 12.9%, and the user satisfaction score rises from 2.9 to 4.7 (out of 5). Meanwhile, the behavior recognition accuracy of the LSTM model reaches 91.8%. The spatial layout adaptability optimized by RL increases by 22.7%. In addition, the user feedback mechanism effectively promotes continuous optimization, significantly enhancing the system’s personalized response capability. In general, this study achieves two main objectives. Firstly, it proposes an intelligent space generation and optimization method specifically designed for elderly users. Secondly, it addresses the deficiencies in existing research regarding dynamic adaptation and personalized design. This is accomplished through the synergy of behavior modeling and generative optimization. As a result, the study provides a technically practical path for the aging-friendly transformation of smart homes.

Computational linguistics. Natural language processing, Electronic computers. Computer science

Detail DOI Sumber

DOAJ Open Access 2025

Misogynous Memes Recognition: Training vs Inference Bias Mitigation Strategies

Gianmaria Balducci, Giulia Rizzi, Elisabetta Fersini

In this paper, we address the problem of automatic misogynous meme recognition by dealing with potentially biased elements that could lead to unfair models. In particular, a bias estimation technique is used to identify those textual and visual elements that unintendedly affect the model prediction, and a few bias mitigation methods are proposed, investigating two different types of debiasing strategies, i.e., at training time and at inference time. The proposed approaches achieve remarkable results both in terms of prediction and generalization capabilities.

Social Sciences, Computational linguistics. Natural language processing

Detail Sumber

arXiv Open Access 2025

EuroGEST: Investigating gender stereotypes in multilingual language models

Jacqueline Rowe, Mateusz Klimaszewski, Liane Guillou et al.

Large language models increasingly support multiple languages, yet most benchmarks for gender bias remain English-centric. We introduce EuroGEST, a dataset designed to measure gender-stereotypical reasoning in LLMs across English and 29 European languages. EuroGEST builds on an existing expert-informed benchmark covering 16 gender stereotypes, expanded in this work using translation tools, quality estimation metrics, and morphological heuristics. Human evaluations confirm that our data generation method results in high accuracy of both translations and gender labels across languages. We use EuroGEST to evaluate 24 multilingual language models from six model families, demonstrating that the strongest stereotypes in all models across all languages are that women are 'beautiful', 'empathetic' and 'neat' and men are 'leaders', 'strong, tough' and 'professional'. We also show that larger models encode gendered stereotypes more strongly and that instruction finetuning does not consistently reduce gendered stereotypes. Our work highlights the need for more multilingual studies of fairness in LLMs and offers scalable methods and resources to audit gender bias across languages.

en cs.CL

Detail DOI Sumber

arXiv Open Access 2025

Analysis of LLM as a grammatical feature tagger for African American English

Rahul Porwal, Alice Rozet, Pryce Houck et al.

African American English (AAE) presents unique challenges in natural language processing (NLP). This research systematically compares the performance of available NLP models--rule-based, transformer-based, and large language models (LLMs)--capable of identifying key grammatical features of AAE, namely Habitual Be and Multiple Negation. These features were selected for their distinct grammatical complexity and frequency of occurrence. The evaluation involved sentence-level binary classification tasks, using both zero-shot and few-shot strategies. The analysis reveals that while LLMs show promise compared to the baseline, they are influenced by biases such as recency and unrelated features in the text such as formality. This study highlights the necessity for improved model training and architectural adjustments to better accommodate AAE's unique linguistic characteristics. Data and code are available.

en cs.CL, cs.AI

Detail DOI Sumber

CrossRef Open Access 2024

Industrial Classification Algorithm for Enterprises based on XLNET Model

Qi Kang, Dongqiang Wang, Xu Zhang et al.

1 sitasi en

Detail DOI Sumber

CrossRef Open Access 2024

ALCG: Chinese Four-clause Compound Sentences Relation Classification Based on LERT Combining CNN and GRU

Zixuan Zhang, Yuan Li

1 sitasi en

Detail DOI Sumber

DOAJ Open Access 2024

Unveiling personality traits through Bangla speech using Morlet wavelet transformation and BiG

Md. Sajeebul Islam Sk., Md. Golam Rabiul Alam

Speech serves as a potent medium for expressing a wide array of psychologically significant attributes. While earlier research on deducing personality traits from user-generated speech predominantly focused on other languages, there is a noticeable absence of prior studies and datasets for automatically assessing user personalities from Bangla speech. In this paper, our objective is to bridge the research gap by generating speech samples, each imbued with distinct personality profiles. These personality impressions are subsequently linked to OCEAN (Openness, Conscientiousness, Extroversion, Agreeableness, and Neuroticism) personality traits. To gauge accuracy, human evaluators, unaware of the speaker’s identity, assess these five personality factors. The dataset is predominantly composed of around 90% content sourced from online Bangla newspapers, with the remaining 10% originating from renowned Bangla novels. We perform feature level fusion by combining MFCCs with LPC features to set MELP and MEWLP features. We introduce MoMF feature extraction method by transforming Morlet wavelet and fusing MFCCs feature. We develop two soft voting ensemble models, DistilRo (based on DistilBERT and RoBERTa) and BiG (based on Bi-LSTM and GRU), for personality classification in speech-to-text and speech modalities, respectively. The DistilRo model has gained F-1 score 89% in speech-to-text and the BiG model has gained F-1 score 90% in speech modality.

Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2024

Sédimentation linguistique dans le parler des jeunes sétifiens

Feiza AICHOUR

Résumé : Le brassage linguistique observé en Algérie a favorisé l’alternance codique. Le locuteur algérien alterne l’arabe dialectal et le français dans le même énoncé, voire dans la même lexie. Notre recherche porte sur le parler des jeunes sétifiens. Nous allons analyser les lexèmes employés/créés par les jeunes locuteurs, à travers une enquête sociolinguistique menée à l’université de Sétif-2-. Les résultats seront exposés dans cet article. Mots clés : Sédimentation linguistique ; alternance codique, parler des jeunes, jeunes sétifiens, université Sétif 2.

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2024

FACILITATING THE EVALUATION OF EFL STUDENTS’ PARAGRAPH WRITING EXAMS THROUGH DEVELOPING AN ANALYTIC RUBRIC

Sarra FELLAHI

Abstract : Teaching writing presents significant challenges, particularly in the evaluation of students' written work. Despite the considerable time that writing teachers invest in reading and comprehending their students’ compositions, they often struggle to provide fair assessments and appropriate scores, especially during examinations. To address these challenges, the present study explores the effectiveness of a paragraph analytic rubric as a tool for assessing students' writing. This rubric aims to unify evaluation criteria and ensure consistent scoring across all students. Developed by the researcher, the rubric has been implemented by second-year EFL writing teachers at the Department of English, Sétif 2 University, during the first semester writing exams over three consecutive years. In this action research project, teachers participated in a questionnaire and focus group discussion that assessed their satisfaction with the paragraph rubric, revealing positive feedback regarding its utility in grading exams. The findings from this research contribute to the establishment of clear assessment criteria for formal exams that require paragraph writing. By employing this rubric, the study seeks to enhance the reliability and fairness of writing assessments, ultimately improving both teaching practices and student learning outcomes in EFL contexts. Keywords: Analytic rubric, Exam Evaluation, Paragraph writing, Rubric creation, Writing teacher.

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2024

Enhancing anemia detection through multimodal data fusion: a non-invasive approach using EHRs and conjunctiva images

Muhammad Ramzan, Muhammad Usman Saeed, Ghulam Ali

Abstract Anemia detection using multimodal approaches leverages the integration of multiple data sources, such as imaging, clinical records, and hematological parameters, to improve diagnostic accuracy. Such methods can capture the complex interplay of factors contributing to anemia, providing a more comprehensive assessment than traditional single-modality techniques. In this research, a novel deep learning multi-modal feature fusion approach is proposed for the automated detection of anemia using EHRs (Electronic Health Records), and Conjunctiva image dataset. First, EHR records are preporcessed by selecting the most appropriate features using Random Forest. The features from the conjunctiva images are extracted using RCBAM (Reverse Convolution Block Attention Mechanism). After that, GRAD-Cam algorithm is applied to calculate the pixel percentages of all the features. The output of Random Forest and GRAD-Cam algorithms is concatenated to form a multimodal fusion. The important information from the concatenated features is selected with the help of a professional healthcare consultant. The different experiments are performed on textual and image datasets individually and after concatenating. The results show that the proposed model outperforms from state-of-the-art methods with an accuracy of 95%. Despite challenges such as class imbalance and computational demands, our findings reveal substantial clinical potential, offering a patient-friendly and accessible diagnostic solution.

Computational linguistics. Natural language processing, Electronic computers. Computer science

Detail DOI Sumber

DOAJ Open Access 2024

La transition de l’Algérie vers une économie de la connaissance : Pour la mise en place d’une politique éducative performante

Nassima BELAZREG & Zineb Moustiri

Résumé : L’objectif de cette contribution est représenté par un bilan des efforts consentis par le gouvernement algérien afin de développer une économie de la connaissance. Il s’avère que l’investissement est basé sur les nouvelles technologies de l’information et de la communication et sur le capital humain qui sont les jeunes diplômés universitaires et chercheurs. Cependant, quelques carences éducatives soulevées continuent à entraver la transition d’une économie basée sur les rentes pétrolières vers l’économie de la connaissance que sont : l’éducation scolaire, l’éducation secondaire et universitaire. Mots-clés : Société ; Science ; NTIC ; économie de la connaissance ; innovation ; éducation.

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2024

Enseignement du français à l’UDC : État des lieux et contraintes

Anzim ATTOUMANE HALIDI

Résumé : Les problèmes liés à la didactique du français à l’Université des Comores deviennent une véritable inquiétude chez les acteurs pédagogiques. Cet article vise à montrer un état des lieux pour l’enseignement du français à l’UDC. Comment fonctionne celui-ci ? Quelles sont les contraintes qui en découlent ? Pour ce faire, nous adoptons une approche descriptive visant à mettre l’accent sur quatre aspects essentiels. D’abord, nous décrirons la didactique du FLE, l’enseignement du français de l’Université des Comores. Cette description se focalisera sur les objectifs, le programme, la méthode adoptée, et sur les matériels didactiques de cet enseignement. Ensuite, nous exposerons le statut sociolinguistique des langues. Puis, nous évoquerons les difficultés de l’enseignement du français à l’UDC. Enfin, nous proposerons les stratégies d’action afin de l’améliorer. Mots-clés : enseignement du français, fonctionnement, difficultés, didactique, Université des Comores

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2024

Le « jihe masikoro », un genre chanté traditionnel malgache

Simon Seta RASOLOFOMASY

Résumé : Le « jihe » est un genre littéraire oral traditionnel « masikoro ». Géographiquement, le groupe ethnique « masikoro » est situé au Sud-Ouest de Madagascar. Cette communication a pour objectif de rappeler à la génération actuelle que la valeur culturelle malgache comme la sagesse ancestrale ainsi que l’identité du « masikoro » sont encore conservées dans le « jihe». Il est encore vivant jusqu’à ce jour. Mais pourquoi ces messages éducatifs n’ont pas été reçues par les jeunes délinquants actuels de cette société ? La littérarité du « jihe » est prouvée par sa richesse en figures de style. La société pratique le « jihe » pour encourager ceux qui travaillent avec acharnement ou animer des festivités traditionnelles ou lors d’un enterrement. Ce message éduque la population et également une balise pour ne pas faire du mal au sein de la société. Quand la population reçoit sérieusement les messages éducatifs du genre chanté jihe, la société vit toujours en paix puisque personne ne pense faire du mal. Mots-clés : Genre littéraire, oral, sagesse, message, éducatif.

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

CrossRef Open Access 2024

ERDL: Efficient Retrieval Framework Based on Distillation from Large Language Models

Heng Yu, Rui Li, Zheng Zhang et al.

en

Detail DOI Sumber

arXiv Open Access 2024

A Computational Model for the Assessment of Mutual Intelligibility Among Closely Related Languages

Jessica Nieder, Johann-Mattis List

Closely related languages show linguistic similarities that allow speakers of one language to understand speakers of another language without having actively learned it. Mutual intelligibility varies in degree and is typically tested in psycholinguistic experiments. To study mutual intelligibility computationally, we propose a computer-assisted method using the Linear Discriminative Learner, a computational model developed to approximate the cognitive processes by which humans learn languages, which we expand with multilingual semantic vectors and multilingual sound classes. We test the model on cognate data from German, Dutch, and English, three closely related Germanic languages. We find that our model's comprehension accuracy depends on 1) the automatic trimming of inflections and 2) the language pair for which comprehension is tested. Our multilingual modelling approach does not only offer new methodological findings for automatic testing of mutual intelligibility across languages but also extends the use of Linear Discriminative Learning to multilingual settings.

en cs.CL

Detail Sumber

arXiv Open Access 2024

Generalized Measures of Anticipation and Responsivity in Online Language Processing

Mario Giulianelli, Andreas Opedal, Ryan Cotterell

We introduce a generalization of classic information-theoretic measures of predictive uncertainty in online language processing, based on the simulation of expected continuations of incremental linguistic contexts. Our framework provides a formal definition of anticipatory and responsive measures, and it equips experimenters with the tools to define new, more expressive measures beyond standard next-symbol entropy and surprisal. While extracting these standard quantities from language models is convenient, we demonstrate that using Monte Carlo simulation to estimate alternative responsive and anticipatory measures pays off empirically: New special cases of our generalized formula exhibit enhanced predictive power compared to surprisal for human cloze completion probability as well as ELAN, LAN, and N400 amplitudes, and greater complementarity with surprisal in predicting reading times.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2024

Language Models as Models of Language

Raphaël Millière

This chapter critically examines the potential contributions of modern language models to theoretical linguistics. Despite their focus on engineering goals, these models' ability to acquire sophisticated linguistic knowledge from mere exposure to data warrants a careful reassessment of their relevance to linguistic theory. I review a growing body of empirical evidence suggesting that language models can learn hierarchical syntactic structure and exhibit sensitivity to various linguistic phenomena, even when trained on developmentally plausible amounts of data. While the competence/performance distinction has been invoked to dismiss the relevance of such models to linguistic theory, I argue that this assessment may be premature. By carefully controlling learning conditions and making use of causal intervention methods, experiments with language models can potentially constrain hypotheses about language acquisition and competence. I conclude that closer collaboration between theoretical linguists and computational researchers could yield valuable insights, particularly in advancing debates about linguistic nativism.

en cs.CL

Detail Sumber

arXiv Open Access 2024

Quantum Natural Language Processing

Dominic Widdows, Willie Aboumrad, Dohun Kim et al.

Language processing is at the heart of current developments in artificial intelligence, and quantum computers are becoming available at the same time. This has led to great interest in quantum natural language processing, and several early proposals and experiments. This paper surveys the state of this area, showing how NLP-related techniques have been used in quantum language processing. We examine the art of word embeddings and sequential models, proposing some avenues for future investigation and discussing the tradeoffs present in these directions. We also highlight some recent methods to compute attention in transformer models, and perform grammatical parsing. We also introduce a new quantum design for the basic task of text encoding (representing a string of characters in memory), which has not been addressed in detail before. Quantum theory has contributed toward quantifying uncertainty and explaining "What is intelligence?" In this context, we argue that "hallucinations" in modern artificial intelligence systems are a misunderstanding of the way facts are conceptualized: language can express many plausible hypotheses, of which only a few become actual.

en quant-ph, cs.AI

Detail DOI Sumber

DOAJ Open Access 2023

Efficiency of Machine Translation in the Language Processing Process; Using Context Clues in Finding the [Exact] Meaning of Quranic Words [In Persian]

Zaynab Shams, Sepideh Chehreh

Translation is the transfer of the content of a text from the source language in to the target language, which is done by finding semantic equivalents between the two languages. The most important problems facing translation are the ambiguities in vocabulary and sentence structure. In a division, there are five important types of lexical ambiguity (categorical ambiguities, homophones, homographs, polysemy and transitive ambiguity), and two important types of structural ambiguity (real structural ambiguities and systemic ambiguities). Machine translation (MT), which is a part of the computer-based field of natural language processing (NLP) in computational linguistics and artificial intelligence, is considered as one of the automatic techniques that that convert unstructured text into structured data, and by converting text into information, it has been able to apply further analysis to the data to extract useful information. In this article, which was compiled in a library method, a theoretical plan has been proposed to resolve the issues surrounding the meaning of words in the machine translation of the Quran, the purpose of which is to help better understand the meaning of the words of the Quran, by taking advantage of the context clues and styles of the expressions. In the proposed method, a more suitable equivalent word is chosen in the target language by taking advantage of the context rule and text mining techniques, and referring to it. In this plan, the context is considered in the scale of words, which can be developed to other types if the conditions are met. In short, this plan has two steps: prioritizing (weighting) the adjacent words next to each other (any word within the range of verses where there is a consensus about their simultaneous descent) and then, comparing with the homonyms words (polysemous), and also comparing the equivalents of a word with the equivalents of other words (synonymization). In order to make the results more accurate, more specifications of the words can be prepared manually, tables that include things such as whether the verses are Meccan or Medinan, the order of revelation of the Surahs, the concepts and interpretations that are mentioned in the meaning of the words of the Qur'an in dictionaries such as Lisan al-Arab by Ibn Manzur and The Book of Vocabulary in the Strange Qur'an by Al-Ragheb Al-Isfahani and so on. Indexing techniques are used to obtain input data. In the pre-processing stage, the data that is less important (Stop Words) (such as “al-lazi (which)”, “al-lati (that is)”, “lam (not)”, “k'ana (was)”, “kaannama (as if)”, etc.) should be removed to get a better output. To change the shape of the data, the diacritic can be removed to make coding easier, and to reduce the sample size, the infix of the words can be used. In order to prepare a record of specifications for each word that is processed as input, based on the rule of context clues, at first, it is necessary to create a tokenizer, to prepare it in the primary data, and in the entire collection of input verses, a weight should be assigned to each word based on the two criteria of spatial proximity and frequency of repetition. The closer the words are to the desired word or the more it is repeated, the more weight is assigned to it, which represents their stronger semantic connection, and vice versa. Naturally, the words that are in the same verse (have the same number of the verse) have a greater influence than the words that are in other verses and at a further distance. In measuring the frequency criterion, weighted frequency (TF/IDF Weight) is used to show the importance of the word in the surah, the value (TF/IDF value) increases proportionally to the number of times a word appears in each surah or set of input verses, and is balanced by the number of verses that are in the Surah and contain the word. Finally, it was concluded that by using the contiguity of words and the semantic relations between them, and with the help of text mining techniques, a greater understanding of the vocabulary was obtained, which leads to a more appropriate selection of the equivalent word in the target language.

Philology. Linguistics

Detail Sumber

DOAJ Open Access 2023

A Statistical Parsing Framework for Sentiment Classification

Li Dong, Furu Wei, Shujie Liu et al.

Computational linguistics. Natural language processing

Detail DOI Sumber

Hasil untuk "Computational linguistics. Natural language processing"