Hasil "Computational linguistics. Natural language processing"

DOAJ Open Access 2026

Performance centric review of machine learning techniques for electric vehicle powertrain with battery management and charging systems

R. Prasanna, R. Senthil Kumar, N. S. Bhuvaneswari et al.

Abstract The rapid proliferation of electric vehicles (EVs) necessitates advanced intelligent systems to manage their increasingly complex subsystems ranging from powertrain optimization to battery management and charging infrastructure. Machine learning (ML) has emerged as a transformative tool capable of addressing these challenges through data driven prediction, adaptive control, and intelligent decision making. However, existing reviews are still disjointed, broadly limited to point subsystems, and deficient in standardized performance assessment, unified benchmarking, and analytical understanding of algorithmic appropriateness. These limit the comparability, interpretability, and real world implementation of machine learning models in electric vehicle infrastructure. This review uniquely remedies these shortcomings in a performance oriented, cross domain synthesis of machine learning deployments for electric vehicle design, battery systems, and charging infrastructure. Systematically examining 240 peer reviewed publications based on unified statistical metrics like Root Mean Square Error (RMSE) and Coefficient of Determination (R2), it provides a quantitative benchmarking framework linking algorithmic groups to subsystem performance. The analysis reveals that hybrid deep learning models such as Long Short Term Memory with Double Deep Q-Network (LSTM-DDQN), Convolutional Neural Networks-Bidirectional Gated Recurrent Units (CNN-BiGRU), and transformer based frameworks consistently achieve superior accuracy (R2 > 0.95, RMSE < 0.05) compared to traditional algorithms. The paper also clarifies the rationale behind why ensemble hybrid deep learning models systematically outperform conventional methods, thus laying the groundwork for subsystem variant optimization and deployment tactics. Along with consolidation, the survey brings to focus crucial open research issues that include dataset standardization, interpretable models, interface to Controller Area Network (CAN)/On-Board Diagnostics (OBD) communication protocols, and federated learning towards ensuring privacy in electric vehicle networks. Overall, this work advances the field by turning disjoint literature into an analytically informed guide to intelligent, interpretable, and scalable electric vehicles. This review concludes by identifying promising future directions such as explainable artificial intelligence (XAI), digital twins, embedded ML, and federated autonomy that will underpin the next generation of intelligent, scalable, and sustainable EV ecosystems.

Computational linguistics. Natural language processing, Electronic computers. Computer science

Detail DOI Sumber

DOAJ Open Access 2025

Musaddas Shaher Ashob of Dagh Delhvi: Analytical Study

Razia Majeed

<span style="font-size: 12.0pt; line-height: 115%; font-family: "Times New Roman","serif"; mso-ascii-theme-font: major-bidi; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: major-bidi; mso-bidi-theme-font: major-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;">Mirza Khan dagh Dehlvi is considered the last poet of Dehlvi school of classic Urdu poetry. He was a lover of beautiful faces. His entire life was spent composing poems of love and affection. In general perception, there is no despair in his poetry. But theTragedy of the war of Independence was very difficult time for Dagh Dehlvi, which he had expressed in his poem “Musaddas Shahr Aashob”. This poem is an elegy of destroyed and late Delhi of his times. Moreover, this poem is a unique piece of art that incorporates references to the contemporary history of that time. “Musaddas Shahr Aashob” is revealing significant information about the contemporary individual and collective attitudes of the people in Delhi during the War of Independence of 1857 and soon after that. In this poem Nawab Mirza Dagh Dehlvihimself is a representative of the mental state of the Delhi’s elite immediately after the war of Independence. Like many other people in Delhi, Dagh blamed the rebels for the destruction of Delhi. It was very difficult for the people of Delhi to admit their mistakes committed in the past. This poem is dominated by feelings of patriotism and love for Delhi. Dagh is a person associated with the culture and glorious past of Delhi. He said that those who attacked Delhi have done what is not permissible in any religion. Then he called those rebels “Mata Din” and “Ganga Din”.These words refer to Hinduism. Thus, this poem also strengthens the Two-Nations Theory in Indian Subcontinent. The theory that grew stronger and stronger in the coming times as the greatest truth in the history of Indo-Pak.</span>

Language. Linguistic theory. Comparative grammar, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2025

L'ENSEIGNEMENT DES STRATEGIES D'APPRENTISSAGE DU LEXIQUE (SAL) SUR L'ACQUISITION DU LEXIQUE DU FRANÇAIS L2: ÉTUDE EXPERIMENTALE DANS LE CONTEXTE UNIVERSITAIRE MAROCAIN

EL JABLY Fouad, BAGHAD Ismail & ZAHIR Mouard

Abstract : Les stratégies d’apprentissage du lexique (SAL) sont un ensemble de méthodes, d’outils et de moyens aidant l’apprenant à optimiser l’acquisition du lexique d’une langue étrangère. La présente étude cherche à évaluer l’impact d’une formation en SAL pour l’apprentissage du lexique du FLE dans le contexte universitaire marocain. Un total de 225 étudiants inscrits au département de langue et littérature françaises, à la Faculté de lettres et de sciences humaines Ben M’sik (FLSHBM) de Casablanca, Maroc a participé à l’étude. Le groupe expérimental a étudié le lexique cible après avoir suivi une formation ad hoc en SAL, alors que le groupe de contrôle a appris ledit lexique en utilisant ses propres moyens. Après une période d’expérimentation de 4 mois, nous avons utilisé des outils qualitatifs et quantitatifs pour identifier les retombées de la formation octroyée et le degré de satisfaction des participants à son égard. Parmi les résultats de cette étude nous citons le besoin pressant de former les apprenants du français L2 aux SAL, la nécessité de recourir aux TICE au lieu de les négliger (Médias sociaux, Smartphone). Mots-clés : Les stratégies d’apprentissage du lexique (SAL), FLE, enseignement des L2.

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2025

The potential of machine learning in diagnosing neurological and psychiatric diseases: a review

Claudia Ricetti, Luca Carrara, Davide La Torre

Abstract Purpose of the study Artificial intelligence (AI) is rapidly transforming medical practice, from patient care to diagnosis and treatment personalization. With our work, we aim to explore the application of machine learning (ML) and deep learning (DL) algorithms in the diagnosis of neurological and psychiatric diseases, exploring into both the benefits and the challenges associated with these new technologies. Findings The examined technologies have shown considerable success in early diagnosis, as well as in the identification of risk factors and symptom management for diseases such as Alzheimer’s, Parkinson’s and psychiatric disorders. These models could help improve diagnostic accuracy and enable a more personalized therapeutic approach by utilizing large datasets with information such as biomarkers and medical images. However, certain challenges persist, including concerns about data quality, patient privacy and the ethical implications of algorithmic decisions. Summary Artificial intelligence-based diagnostic methods offer great potential to enhance early diagnosis and, consequently, the management of neurological and psychiatric disorders. To maximise their application, it is essential to ensure transparency and interpretability of the models, which are fundamental for their safe and effective use in medical practice.

Computational linguistics. Natural language processing, Electronic computers. Computer science

Detail DOI Sumber

arXiv Open Access 2025

Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs

Chenqian Le, Ziheng Gong, Chihang Wang et al.

Large language models (LLMs) have shown great potential in medical question answering (MedQA), yet adapting them to biomedical reasoning remains challenging due to domain-specific complexity and limited supervision. In this work, we study how prompt design and lightweight fine-tuning affect the performance of open-source LLMs on PubMedQA, a benchmark for multiple-choice biomedical questions. We focus on two widely used prompting strategies - standard instruction prompts and Chain-of-Thought (CoT) prompts - and apply QLoRA for parameter-efficient instruction tuning. Across multiple model families and sizes, our experiments show that CoT prompting alone can improve reasoning in zero-shot settings, while instruction tuning significantly boosts accuracy. However, fine-tuning on CoT prompts does not universally enhance performance and may even degrade it for certain larger models. These findings suggest that reasoning-aware prompts are useful, but their benefits are model- and scale-dependent. Our study offers practical insights into combining prompt engineering with efficient finetuning for medical QA applications.

en cs.CL

Detail DOI Sumber

arXiv Open Access 2025

Mining for Species, Locations, Habitats, and Ecosystems from Scientific Papers in Invasion Biology: A Large-Scale Exploratory Study with Large Language Models

Jennifer D'Souza, Zachary Laubach, Tarek Al Mustafa et al.

This paper presents an exploratory study that harnesses the capabilities of large language models (LLMs) to mine key ecological entities from invasion biology literature. Specifically, we focus on extracting species names, their locations, associated habitats, and ecosystems, information that is critical for understanding species spread, predicting future invasions, and informing conservation efforts. Traditional text mining approaches often struggle with the complexity of ecological terminology and the subtle linguistic patterns found in these texts. By applying general-purpose LLMs without domain-specific fine-tuning, we uncover both the promise and limitations of using these models for ecological entity extraction. In doing so, this study lays the groundwork for more advanced, automated knowledge extraction tools that can aid researchers and practitioners in understanding and managing biological invasions.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2025

Integrating Linguistics and AI: Morphological Analysis and Corpus development of Endangered Toto Language of West Bengal

Ambalika Guha, Sajal Saha, Debanjan Ballav et al.

Preserving linguistic diversity is necessary as every language offers a distinct perspective on the world. There have been numerous global initiatives to preserve endangered languages through documentation. This paper is a part of a project which aims to develop a trilingual (Toto-Bangla-English) language learning application to digitally archive and promote the endangered Toto language of West Bengal, India. This application, designed for both native Toto speakers and non-native learners, aims to revitalize the language by ensuring accessibility and usability through Unicode script integration and a structured language corpus. The research includes detailed linguistic documentation collected via fieldwork, followed by the creation of a morpheme-tagged, trilingual corpus used to train a Small Language Model (SLM) and a Transformer-based translation engine. The analysis covers inflectional morphology such as person-number-gender agreement, tense-aspect-mood distinctions, and case marking, alongside derivational strategies that reflect word-class changes. Script standardization and digital literacy tools were also developed to enhance script usage. The study offers a sustainable model for preserving endangered languages by incorporating traditional linguistic methodology with AI. This bridge between linguistic research with technological innovation highlights the value of interdisciplinary collaboration for community-based language revitalization.

en cs.CL, cs.AI

Detail Sumber

DOAJ Open Access 2024

Modeling Regular Polysemy: A Study on the Semantic Classification of Catalan Adjectives

Gemma Boleda, Sabine Schulte im Walde, Toni Badia

Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2024

Complicité et duplicité de la communauté internationale dans l’holocauste congolais

René NGAMBELE NSASAY

Résumé : Avec douze millions de personnes tuées en vingt-cinq ans, l’holocauste congolais est un fait désormais indéniable. Les Congolais ont dès le départ pointé du doigt le Rwanda, l’Ouganda et le Burundi pour leur participation directe et active à l’agression. Restait à établir les complicités ; complicités sans lesquelles le massacre n’aurait pas duré si longtemps, ni fait autant de victimes. C’est l’ambition de ces pages de dénoncer la duplicité de la communauté internationale qui, prétextant œuvrer pour le retour de la paix au Congo, entretient en réalité le crime à travers des résolutions confuses, des accords bancals et une mission aussi budgétivore qu’inutile. Mots-clés : Duplicité, complicité, holocauste, Congo, communauté internationale

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2024

Human Identity In The Shadaw of Biotechnological Excesses

Tahar SAFI & Imene AMEUR

Abstract : This research investigates the erosion of human identity due to the rapid advancements in biotechnology. These advancements utilize highly sophisticated and rigorous tools that are fundamentally altering what it means to be human. This study argues that advancements in biotechnology, despite their benefits, pose a serious threat to core human values and identity. Scientific exploration and experiments that alter the human form challenge our understanding of what it means to be human. The ability to manipulate physical characteristics raises profound ethical concerns about freedom, dignity, and the sanctity of human life. To the extent that genetically or cosmetically modified human is unrecognizable. In addition to the project of superhumanism, which eliminates the current human being, in order to produce a new superhuman being, thus eliminating human identity. Keywords : Identity ; Biotechnological Revolution; Humanity; Human Sacredness; Posthumanity.

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2024

Contrastive adversarial gender debiasing

Nicolás Torres

This research contributes a comprehensive analysis of gender bias within contemporary AI language models, specifically examining iterations of the GPT series, alongside Gemini and Llama. The study offers a systematic investigation, encompassing multiple experiments spanning sentence completions, generative narratives, bilingual analysis, and visual perception assessments. The primary objective is to scrutinize the evolution of gender bias in these models across iterations, explore biases in professions and contexts, and evaluate multilingual disparities. Notably, the analyses reveal a marked evolution in GPT iterations, with GPT4 showcasing significantly reduced or negligible biases, signifying substantial advancements in bias mitigation. Professions and contexts exhibit model biases, indicating associations with specific genders. Multilingual evaluations demonstrate subtle disparities in gender bias tendencies between English and Spanish narratives. To effectively mitigate these biases, we propose a novel Contrastive Adversarial Gender Debiasing (CAGD) method that synergistically combines contrastive learning and adversarial training techniques. The CAGD method enables language models to learn gender-neutral representations while promoting robustness against gender biases, consistently outperforming original and adversarially debiased models across various tasks and metrics. These findings underscore the complexity of gender bias in AI language models, emphasizing the need for continual bias mitigation strategies, such as the proposed CAGD approach, and ethical considerations in AI development and deployment.

Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2024

The critical role of HRM in AI-driven digital transformation: a paradigm shift to enable firms to move from AI implementation to human-centric adoption

Ali Fenwick, Gabor Molnar, Piper Frangos

Abstract The rapid advancement of Artificial Intelligence (AI) in the business sector has led to a new era of digital transformation. AI is transforming processes, functions, and practices throughout organizations creating system and process efficiencies, performing advanced data analysis, and contributing to the value creation process of the organization. However, the implementation and adoption of AI systems in the organization is not without challenges, ranging from technical issues to human-related barriers, leading to failed AI transformation efforts or lower than expected gains. We argue that while engineers and data scientists excel in handling AI and data-related tasks, they often lack insights into the nuanced human aspects critical for organizational AI success. Thus, Human Resource Management (HRM) emerges as a crucial facilitator, ensuring AI implementation and adoption are aligned with human values and organizational goals. This paper explores the critical role of HRM in harmonizing AI's technological capabilities with human-centric needs within organizations while achieving business objectives. Our positioning paper delves into HRM's multifaceted potential to contribute toward AI organizational success, including enabling digital transformation, humanizing AI usage decisions, providing strategic foresight regarding AI, and facilitating AI adoption by addressing concerns related to fears, ethics, and employee well-being. It reviews key considerations and best practices for operationalizing human-centric AI through culture, leadership, knowledge, policies, and tools. By focusing on what HRM can realistically achieve today, we emphasize its role in reshaping roles, advancing skill sets, and curating workplace dynamics to accommodate human-centric AI implementation. This repositioning involves an active HRM role in ensuring that the aspirations, rights, and needs of individuals are integral to the economic, social, and environmental policies within the organization. This study not only fills a critical gap in existing research but also provides a roadmap for organizations seeking to improve AI implementation and adoption and humanizing their digital transformation journey.

Computational linguistics. Natural language processing, Electronic computers. Computer science

Detail DOI Sumber

arXiv Open Access 2024

Misgendering and Assuming Gender in Machine Translation when Working with Low-Resource Languages

Sourojit Ghosh, Srishti Chatterjee

This chapter focuses on gender-related errors in machine translation (MT) in the context of low-resource languages. We begin by explaining what low-resource languages are, examining the inseparable social and computational factors that create such linguistic hierarchies. We demonstrate through a case study of our mother tongue Bengali, a global language spoken by almost 300 million people but still classified as low-resource, how gender is assumed and inferred in translations to and from the high(est)-resource English when no such information is provided in source texts. We discuss the postcolonial and societal impacts of such errors leading to linguistic erasure and representational harms, and conclude by discussing potential solutions towards uplifting languages by providing them more agency in MT conversations.

en cs.CL

Detail Sumber

arXiv Open Access 2024

Testing network clustering algorithms with Natural Language Processing

Ixandra Achitouv, David Chavalarias, Bruno Gaume

The advent of online social networks has led to the development of an abundant literature on the study of online social groups and their relationship to individuals' personalities as revealed by their textual productions. Social structures are inferred from a wide range of social interactions. Those interactions form complex -- sometimes multi-layered -- networks, on which community detection algorithms are applied to extract higher order structures. The choice of the community detection algorithm is however hardily questioned in relation with the cultural production of the individual they classify. In this work, we assume the entangled nature of social networks and their cultural production to propose a definition of cultural based online social groups as sets of individuals whose online production can be categorized as social group-related. We take advantage of this apparently self-referential description of online social groups with a hybrid methodology that combines a community detection algorithm and a natural language processing classification algorithm. A key result of this analysis is the possibility to score community detection algorithms using their agreement with the natural language processing classification. A second result is that we can assign the opinion of a random user at >85% accuracy.

en cs.SI, cs.CL

Detail Sumber

DOAJ Open Access 2023

Machine Translation at Work

Aljoscha Burchardt, Cindy Tscherwinka, Eleftherios Avramidis et al.

Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2023

Perception et signification des passions dans LES FEUX DE LA PLANÈTE de JEAN-BAPTISTE TATI LOUTARD

Wohnouan Marie-Josée DIOUÉ

Résumé : L’œuvre poétique Les feux de la planètede Jean-Baptiste Tati LOUTARD (Tati Loutard, 1977, 47 p.), investit deux formes d’affects ; en l’occurrence la passion amoureuse et la jalousie amoureuse. Chacune des formes sus-indiquées correspond à la relation entre un actant positionnel et un actant transformationnel dont la spécification et l’articulation en termes de attachement / rivalité / émulation focalisent l’attention du sujet de la perception. La jonction catégorisée en disjonction / conjonction qui en découle traduit une modulation tensive construite autour d’une prise de position et par ricochet d’un point de vue qui se décline en visée (flux d’attention) puis en saisie (domaine de pertinence). Mots-clés : actant, perception, jonction / disjonction, tensivité

Arts in general, Computational linguistics. Natural language processing

Detail Sumber

arXiv Open Access 2023

Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer

Qingru Zhang, Dhananjay Ram, Cole Hawkins et al.

Pretrained transformer models have demonstrated remarkable performance across various natural language processing tasks. These models leverage the attention mechanism to capture long- and short-range dependencies in the sequence. However, the (full) attention mechanism incurs high computational cost - quadratic in the sequence length, which is not affordable in tasks with long sequences, e.g., inputs with 8k tokens. Although sparse attention can be used to improve computational efficiency, as suggested in existing work, it has limited modeling capacity and often fails to capture complicated dependencies in long sequences. To tackle this challenge, we propose MASFormer, an easy-to-implement transformer variant with Mixed Attention Spans. Specifically, MASFormer is equipped with full attention to capture long-range dependencies, but only at a small number of layers. For the remaining layers, MASformer only employs sparse attention to capture short-range dependencies. Our experiments on natural language modeling and generation tasks show that a decoder-only MASFormer model of 1.3B parameters can achieve competitive performance to vanilla transformers with full attention while significantly reducing computational cost (up to 75%). Additionally, we investigate the effectiveness of continual training with long sequence data and how sequence length impacts downstream generation performance, which may be of independent interest.

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2023

Adapting Pre-trained Language Models for Quantum Natural Language Processing

Qiuchi Li, Benyou Wang, Yudong Zhu et al.

The emerging classical-quantum transfer learning paradigm has brought a decent performance to quantum computational models in many tasks, such as computer vision, by enabling a combination of quantum models and classical pre-trained neural networks. However, using quantum computing with pre-trained models has yet to be explored in natural language processing (NLP). Due to the high linearity constraints of the underlying quantum computing infrastructures, existing Quantum NLP models are limited in performance on real tasks. We fill this gap by pre-training a sentence state with complex-valued BERT-like architecture, and adapting it to the classical-quantum transfer learning scheme for sentence classification. On quantum simulation experiments, the pre-trained representation can bring 50\% to 60\% increases to the capacity of end-to-end quantum models.

en quant-ph, cs.CL

Detail Sumber

arXiv Open Access 2023

Large Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective on Understanding

Bram M. A. van Dijk, Tom Kouwenhoven, Marco R. Spruit et al.

Current Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text. LLMs are appearing rapidly, and debates on LLM capacities have taken off, but reflection is lagging behind. Thus, in this position paper, we first zoom in on the debate and critically assess three points recurring in critiques of LLM capacities: i) that LLMs only parrot statistical patterns in the training data; ii) that LLMs master formal but not functional language competence; and iii) that language learning in LLMs cannot inform human language learning. Drawing on empirical and theoretical arguments, we show that these points need more nuance. Second, we outline a pragmatic perspective on the issue of `real' understanding and intentionality in LLMs. Understanding and intentionality pertain to unobservable mental states we attribute to other humans because they have pragmatic value: they allow us to abstract away from complex underlying mechanics and predict behaviour effectively. We reflect on the circumstances under which it would make sense for humans to similarly attribute mental states to LLMs, thereby outlining a pragmatic philosophical context for LLMs as an increasingly prominent technology in society.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2023

Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities

Atnafu Lambebo Tonja, Tadesse Destaw Belay, Israel Abebe Azime et al.

This survey delves into the current state of natural language processing (NLP) for four Ethiopian languages: Amharic, Afaan Oromo, Tigrinya, and Wolaytta. Through this paper, we identify key challenges and opportunities for NLP research in Ethiopia. Furthermore, we provide a centralized repository on GitHub that contains publicly available resources for various NLP tasks in these languages. This repository can be updated periodically with contributions from other researchers. Our objective is to identify research gaps and disseminate the information to NLP researchers interested in Ethiopian languages and encourage future research in this domain.

en cs.CL

Detail Sumber

Hasil untuk "Computational linguistics. Natural language processing"