Hasil "Computational linguistics. Natural language processing"

DOAJ Open Access 2025

STRATÉGIES DE COMMUNICATION DE LA LUTTE CONTRE LE PALUDISME EN RÉPUBLIQUE DÉMOCRATIQUE DU CONGO : ANALYSES ET PERSPECTIVES

estor LUTUMBA MBUYI & Felix Emmanuel LUKUSA MULUMBA KATAKULA

Résumé : Le paludisme étant un problème de santé publique, les mécanismes de lutter contre ce fléau ne passent pas seulement par la prévention et la prise en charge, mais aussi par la communication. Celle–ci est un moteur puissant pour engendrer des changements, car elle renferme le processus de transfert, de partage d’idées, de faits ou d’opinions avec quelqu’un, plusieurs personnes ou avec quelque chose. Grâce à la méthode d’analyse de contenu et avec la dynamique des soins de santé primaires qui vise la résolution des problèmes de santé de la communauté, en assurant les services de promotion, de prévention, de soins et de réadaptation nécessaires, nous proposons des stratégies innovantes et en phase avec les évolutions actuelles. Celles-ci intègrent la notion de l’intelligence artificielle pour booster la lutte contre le paludisme à la suite des stratégies existantes utilisées par le Programme National de Lutte contre le Paludisme ainsi que ses divers partenaires. Mots-clés : Communication, Paludisme et Soins de Santé Primaires.

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

CrossRef Open Access 2024

Correcting Factual Errors in LLMs via Inference Paths Based on Knowledge Graph

Weiqi Ye, Qiang Zhang, Xian Zhou et al.

2 sitasi en

Detail DOI Sumber

DOAJ Open Access 2024

INTERPRETING METAPHORICAL LANGUAGE: A CHALLENGE TO ARTIFICIAL INTELLIGENCE

Inna V. Skrynnikova

In recent years, numerous studies have pointed to the ability of artificial intelligence (AI) to generate and analyze expressions of natural language. However, the question of whether AI is capable of actually interpreting human language, rather than imitating its understanding, remains open. Metaphors, being an integral part of human language, as both a common figure of speech and the predominant cognitive mechanism of human reasoning, pose a considerable challenge to AI systems. Based on an overview of the existing studies findings in computational linguistics and related fields, the paper identifies a number of problems associated with the interpretation of non-literal expressions of language by large language models (LLM). It reveals that there is still no clear understanding of the methods for training language models to automatically recognize and interpret metaphors that would bring it closer to the level of human “interpretive competencies”. The purpose of the study is to identify possible reasons that hinder the understanding of figurative language by artificial systems and to outline possible directions for solving this problem. The study suggests that the main barriers to AI’s human-like interpretation of figurative natural language are the absence of a physical body, the inability to reason by analogy and make inferences based on common sense, the latter being both the result and the cognitive process in extracting and processing information. The author concludes that further improvement of the AI systems creative skills should be at the top of the research agenda in the coming years.

Language and Literature

Detail DOI Sumber

DOAJ Open Access 2024

Semantic Field "Catasprophe" in Alternative Worldviews: A Quantitative Dimension

Олександр Колесник

The paper addresses language units that constitute semantic space "catastrophe" within alternative English-mediated worldviews. Designation units synonymic or contextually related to catastrophe / disaster, were chosen from the Web 2021 (enTenTen21) corpus suggested by the Sketch Engine and from the custom corpus of present-day rock lyrics processed via the Ant.Conc 3.5.8 tools. The content of the respective concept is considered through the prism of a logical model that addresses an entropic irrevocable transformation of an open system. He paper focuses on the comparative analysis of the space's composition in two worldview variants. The structure of the semantic space is identified as a field, i.e. a dynamic volumetric continuum that sports multi-level organization as well as each level's zonal segmentation. Special attention is paid to functional-semantic and semiotic properties of the space-field's components. The article employs an interdisciplinary approach that encompasses the myth-oriented semiosis theory and broad inter-systemic analogies ("M-logic"). Key words: system, alternative world, semantic space, corpus, myth Стаття розглядає мовні одиниці, які складають семантичний простір "катастрофа" у структурі альтернативних картин світів, актуалізованих засобами англійської мови. Номінативні одиниці, синонімічні або контекстуально пов’язані з "катастрофою", були відібрані з корпусу Web 2021 (enTenTen21), запропонованого Sketch Engine, та з користувацького корпусу сучасних рок-пісень, оброблених за допомогою інструментів Ant.Conc 3.5.8. Зміст відповідного поняття розглядається крізь призму логічної моделі, яка стосується ентропійної безповоротної трансформації відкритої системи, що призводить до її зламу. У статті здійснено порівняльний аналіз композиції простору у двох варіантах картини світу. Структура семантичного простору ідентифікується як поле, тобто динамічний об’ємний континуум, який має багаторівневу організацію, а також зональну сегментацію кожного рівня. Особливу увагу приділено функціонально-семантичним і семіотичним властивостям компонентів простору-поля. У статті використано міждисциплінарний підхід, який охоплює міфоорієнтовану теорію семіозису та широкі міжсистемні аналогії («М-логіка»). Ключові слова: система, альтернативний світ, семантичний простір, корпус, міф

Discourse analysis, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2024

Appréciation des bibliothèques de l’Université de KINSHASA

Didier MATITI MBUTA, Balbine KILENDA LWENDO, Carmel NZEBA MBAYA et Jérémie LUHEMO MALUNGIDI

Résumé : Bien que l’administration coloniale ait laissé, un peu partout en Afrique, des bibliothèques, il n’en reste pas moins que la bibliothéconomie africaine, celle qui est faite par les africains, est et demeure une bibliothéconomie de gestion des bibliothèques. Or, la République démocratique du Congo, confronté avec les problèmes urgents de l’eau, de l’électricité, de l’alimentation, de la santé pour tous, et j’en passe, n’a pas encore dégagé les moyens d’une véritable politique de la bibliothéconomie nationale. Pourtant, elles constituent un ensemble du patrimoine documentaire et culturel de la nation congolaise, au même titre que les objets d’art, les trésors des musées et la rumba. Par conséquent, elles méritent une attention particulière des autorités et de l’ensemble de la communauté universitaire. Mots-clés : République démocratique du Congo, Université de Kinshasa, Bibliothèque universitaire.

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

arXiv Open Access 2024

CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models

Zaid Sheikh, Antonios Anastasopoulos, Shruti Rijhwani et al.

Effectively using Natural Language Processing (NLP) tools in under-resourced languages requires a thorough understanding of the language itself, familiarity with the latest models and training methodologies, and technical expertise to deploy these models. This could present a significant obstacle for language community members and linguists to use NLP tools. This paper introduces the CMU Linguistic Annotation Backend, an open-source framework that simplifies model deployment and continuous human-in-the-loop fine-tuning of NLP models. CMULAB enables users to leverage the power of multilingual models to quickly adapt and extend existing tools for speech recognition, OCR, translation, and syntactic analysis to new languages, even with limited training data. We describe various tools and APIs that are currently available and how developers can easily add new models/functionality to the framework. Code is available at https://github.com/neulab/cmulab along with a live demo at https://cmulab.dev

en cs.CL

Detail Sumber

arXiv Open Access 2024

Construction of a Japanese Financial Benchmark for Large Language Models

Masanori Hirano

With the recent development of large language models (LLMs), models that focus on certain domains and languages have been discussed for their necessity. There is also a growing need for benchmarks to evaluate the performance of current LLMs in each domain. Therefore, in this study, we constructed a benchmark comprising multiple tasks specific to the Japanese and financial domains and performed benchmark measurements on some models. Consequently, we confirmed that GPT-4 is currently outstanding, and that the constructed benchmarks function effectively. According to our analysis, our benchmark can differentiate benchmark scores among models in all performance ranges by combining tasks with different difficulties.

en q-fin.CP, cs.CL

Detail Sumber

arXiv Open Access 2024

Are Small Language Models Ready to Compete with Large Language Models for Practical Applications?

Neelabh Sinha, Vinija Jain, Aman Chadha

The rapid rise of Language Models (LMs) has expanded their use in several applications. Yet, due to constraints of model size, associated cost, or proprietary restrictions, utilizing state-of-the-art (SOTA) LLMs is not always feasible. With open, smaller LMs emerging, more applications can leverage their capabilities, but selecting the right LM can be challenging as smaller LMs do not perform well universally. This work tries to bridge this gap by proposing a framework to experimentally evaluate small, open LMs in practical settings through measuring semantic correctness of outputs across three practical aspects: task types, application domains, and reasoning types, using diverse prompt styles. It also conducts an in-depth comparison of 10 small, open LMs to identify the best LM and prompt style depending on specific application requirements using the proposed framework. We also show that if selected appropriately, they can outperform SOTA LLMs like DeepSeek-v2, GPT-4o, GPT-4o-mini, Gemini-1.5-Pro, and even compete with GPT-4o.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2024

Min-Max Framework for Majorization-Minimization Algorithms in Signal Processing Applications: An Overview

Astha Saini, Petre Stoica, Prabhu Babu et al.

This monograph presents a theoretical background and a broad introduction to the Min-Max Framework for Majorization-Minimization (MM4MM), an algorithmic methodology for solving minimization problems by formulating them as min-max problems and then employing majorization-minimization. The monograph lays out the mathematical basis of the approach used to reformulate a minimization problem as a min-max problem. With the prerequisites covered, including multiple illustrations of the formulations for convex and non-convex functions, this work serves as a guide for developing MM4MM-based algorithms for solving non-convex optimization problems in various areas of signal processing. As special cases, we discuss using the majorization-minimization technique to solve min-max problems encountered in signal processing applications and min-max problems formulated using the Lagrangian. Lastly, we present detailed examples of using MM4MM in ten signal processing applications such as phase retrieval, source localization, independent vector analysis, beamforming and optimal sensor placement in wireless sensor networks. The devised MM4MM algorithms are free of hyper-parameters and enjoy the advantages inherited from the use of the majorization-minimization technique such as monotonicity.

en eess.SP

Detail DOI Sumber

DOAJ Open Access 2023

Source Language Adaptation Approaches for Resource-Poor Machine Translation

Pidong Wang, Preslav Nakov, Hwee Tou Ng

Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2023

Weakly supervised machine learning

Zeyu Ren, Shuihua Wang, Yudong Zhang

Abstract Supervised learning aims to build a function or model that seeks as many mappings as possible between the training data and outputs, where each training data will predict as a label to match its corresponding ground‐truth value. Although supervised learning has achieved great success in many tasks, sufficient data supervision for labels is not accessible in many domains because accurate data labelling is costly and laborious, particularly in medical image analysis. The cost of the dataset with ground‐truth labels is much higher than in other domains. Therefore, it is noteworthy to focus on weakly supervised learning for medical image analysis, as it is more applicable for practical applications. In this review, the authors give an overview of the latest process of weakly supervised learning in medical image analysis, including incomplete, inexact, and inaccurate supervision, and introduce the related works on different applications for medical image analysis. Related concepts are illustrated to help readers get an overview ranging from supervised to unsupervised learning within the scope of machine learning. Furthermore, the challenges and future works of weakly supervised learning in medical image analysis are discussed.

Computational linguistics. Natural language processing, Computer software

Detail DOI Sumber

arXiv Open Access 2023

When a Language Question Is at Stake. A Revisited Approach to Label Sensitive Content

Stetsenko Daria

Many under-resourced languages require high-quality datasets for specific tasks such as offensive language detection, disinformation, or misinformation identification. However, the intricacies of the content may have a detrimental effect on the annotators. The article aims to revisit an approach of pseudo-labeling sensitive data on the example of Ukrainian tweets covering the Russian-Ukrainian war. Nowadays, this acute topic is in the spotlight of various language manipulations that cause numerous disinformation and profanity on social media platforms. The conducted experiment highlights three main stages of data annotation and underlines the main obstacles during machine annotation. Ultimately, we provide a fundamental statistical analysis of the obtained data, evaluation of models used for pseudo-labelling, and set further guidelines on how the scientists can leverage the corpus to execute more advanced research and extend the existing data samples without annotators' engagement.

en cs.CL

Detail Sumber

arXiv Open Access 2023

The ACL OCL Corpus: Advancing Open Science in Computational Linguistics

Shaurya Rohatgi, Yanxia Qin, Benjamin Aw et al.

We present ACL OCL, a scholarly corpus derived from the ACL Anthology to assist Open scientific research in the Computational Linguistics domain. Integrating and enhancing the previous versions of the ACL Anthology, the ACL OCL contributes metadata, PDF files, citation graphs and additional structured full texts with sections, figures, and links to a large knowledge resource (Semantic Scholar). The ACL OCL spans seven decades, containing 73K papers, alongside 210K figures. We spotlight how ACL OCL applies to observe trends in computational linguistics. By detecting paper topics with a supervised neural model, we note that interest in "Syntax: Tagging, Chunking and Parsing" is waning and "Natural Language Generation" is resurging. Our dataset is available from HuggingFace (https://huggingface.co/datasets/WINGNUS/ACL-OCL).

en cs.CL, cs.DL

Detail Sumber

arXiv Open Access 2023

A Deep Learning System for Domain-specific Speech Recognition

Yanan Jia

As human-machine voice interfaces provide easy access to increasingly intelligent machines, many state-of-the-art automatic speech recognition (ASR) systems are proposed. However, commercial ASR systems usually have poor performance on domain-specific speech especially under low-resource settings. The author works with pre-trained DeepSpeech2 and Wav2Vec2 acoustic models to develop benefit-specific ASR systems. The domain-specific data are collected using proposed semi-supervised learning annotation with little human intervention. The best performance comes from a fine-tuned Wav2Vec2-Large-LV60 acoustic model with an external KenLM, which surpasses the Google and AWS ASR systems on benefit-specific speech. The viability of using error prone ASR transcriptions as part of spoken language understanding (SLU) is also investigated. Results of a benefit-specific natural language understanding (NLU) task show that the domain-specific fine-tuned ASR system can outperform the commercial ASR systems even when its transcriptions have higher word error rate (WER), and the results between fine-tuned ASR and human transcriptions are similar.

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2023

Natural language processing on customer note data

Andrew Hilditch, David Webb, Jozef Baca et al.

Automatic analysis of customer data for businesses is an area that is of interest to companies. Business to business data is studied rarely in academia due to the sensitive nature of such information. Applying natural language processing can speed up the analysis of prohibitively large sets of data. This paper addresses this subject and applies sentiment analysis, topic modelling and keyword extraction to a B2B data set. We show that accurate sentiment can be extracted from the notes automatically and the notes can be sorted by relevance into different topics. We see that without clear separation topics can lack relevance to a business context.

en cs.CL

Detail Sumber

arXiv Open Access 2022

A Survey of Active Learning for Natural Language Processing

Zhisong Zhang, Emma Strubell, Eduard Hovy

In this work, we provide a survey of active learning (AL) for its applications in natural language processing (NLP). In addition to a fine-grained categorization of query strategies, we also investigate several other important aspects of applying AL to NLP problems. These include AL for structured prediction tasks, annotation cost, model learning (especially with deep neural models), and starting and stopping AL. Finally, we conclude with a discussion of related topics and future directions.

en cs.CL

Detail Sumber

S2 Open Access 2021

DHQ: Digital Humanities Quarterly: Developing Geographically Oriented NLP Approaches to Sixteenth–Century Historical Documents: Digging into Early Colonial Mexico

Diego Jiménez-Badillo, Patricia Murrieta-Flores, Bruno Martins et al.

5 sitasi en

Detail Sumber

DOAJ Open Access 2021

Discourse Constraints for Document Compression

James Clarke, Mirella Lapata

Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2021

Unsupervised Quality Estimation for Neural Machine Translation

Marina Fomicheva, Shuo Sun, Lisa Yankovskaya et al.

Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2021

Spelling Error Patterns in Brazilian Portuguese

Priscila A. Gimenes, Norton T. Roman, Ariadne M. Carvalho

Computational linguistics. Natural language processing

Detail DOI Sumber

Hasil untuk "Computational linguistics. Natural language processing"