Hasil "Computational linguistics. Natural language processing"

S2 Open Access 2009

Book Review: Natural Language Processing with Python by Steven Bird, Ewan Klein, and Edward Loper

Michael Elhadad

1612 sitasi en Art, Computer Science

Detail DOI Sumber

S2 Open Access 2021

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

Junyi Ao, Rui Wang, Long Zhou et al.

Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning. The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After preprocessing the input speech/text through the pre-nets, the shared encoder-decoder network models the sequence-to-sequence transformation, and then the post-nets generate the output in the speech/text modality based on the output of the decoder. Leveraging large-scale unlabeled speech and text data, we pre-train SpeechT5 to learn a unified-modal representation, hoping to improve the modeling capability for both speech and text. To align the textual and speech information into this unified semantic space, we propose a cross-modal vector quantization approach that randomly mixes up speech/text states with latent units as the interface between encoder and decoder. Extensive evaluations show the superiority of the proposed SpeechT5 framework on a wide variety of spoken language processing tasks, including automatic speech recognition, speech synthesis, speech translation, voice conversion, speech enhancement, and speaker identification.

265 sitasi en Computer Science, Engineering

Detail DOI Sumber

S2 Open Access 2016

Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health

Tim Althoff, Kevin Clark, J. Leskovec

Mental illness is one of the most pressing public health issues of our time. While counseling and psychotherapy can be effective treatments, our knowledge about how to conduct successful counseling conversations has been limited due to lack of large-scale data with labeled outcomes of the conversations. In this paper, we present a large-scale, quantitative study on the discourse of text-message-based counseling conversations. We develop a set of novel computational discourse analysis methods to measure how various linguistic aspects of conversations are correlated with conversation outcomes. Applying techniques such as sequence-based conversation models, language model comparisons, message clustering, and psycholinguistics-inspired word frequency analyses, we discover actionable conversation strategies that are associated with better conversation outcomes.

324 sitasi en Medicine, Computer Science

Detail DOI Sumber

CrossRef Open Access 2025

Dotless Arabic Text for Natural Language Processing

Maged S. Al-Shaibani, Irfan Ahmad

Abstract This article introduces a novel representation of Arabic text as an alternative approach for Arabic NLP, inspired by the dotless script of ancient Arabic. We explored this representation through extensive analysis on various text corpora, differing in size and domain, and tokenized using multiple tokenization techniques. Furthermore, we examined the information density of this representation and compared it with the standard dotted Arabic text using text entropy analysis. Utilizing parallel corpora, we also drew comparisons between Arabic and English text analysis to gain additional insights. Our investigation extended to various upstream and downstream NLP tasks, including language modeling, text classification, sequence labeling, and machine translation, examining the implications of both the representations. Specifically, we performed seven different downstream tasks using various tokenization schemes comparing the standard dotted text with dotless Arabic text representations. Performance using both the representations was comparable across different tokenizations. However, dotless representation achieves these results with significant reduction in vocabulary sizes, and in some scenarios showing reduction of up to 50%. Additionally, we present a system that restores dots to the dotless Arabic text. This system is useful for tasks that require Arabic texts as output.

1 sitasi en

Detail DOI Sumber

DOAJ Open Access 2025

LA GUERRE D'INDÉPENDANCE AU CŒUR DE L'ÉCRITURE MÉTAPHORIQUE DE MAISSA BEY DANS PIERRE SANG PAPIER OU CENDRE

Faiza MARREF

Résumé : Le témoin est un enfant algérien : il raconte Pierre Sang Papier Ou Cendre , avec ses mots porteurs d’une histoire collective de la grande guerre d’Algérie , il fait revive à nos yeux , de façon tragique et précise , cette guerre d’indépendance, ainsi il s’efforce de distinguer dans son information ce face à face historique, cette narration embrasse tout un siècle, il raconte un temps violent et unique , nourrie de grands cris, de son d’artillerie , de jet bombardiers, de sifflement des balles , et il décrit les visages ravagés et les regards absents avec une narration vivante et étonnante d’un caractère alerte et tragique , nous revisitons l’histoire de cette guerre en mettant l’accent sur chaque mot à fort pouvoir vibratoire et des métaphores qui revêtent des fonctions rhétoriques. En s’appuyant sur l’approche rhétorique qui repose comme nous venons de la voir sur une force de frappe sémantique qui éclot au fil des pages. Cette analyse a pour but de démontrer combien Pierre Sang Papier ou cendre est différent des autres récits historiques par sa valeur littéraire. Mots-clés : Pierre Sang Papier ou Cendre, écriture métaphorique, sensibilité poétique, guerre.

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2025

BMWP: the first Bengali math word problems dataset for operation prediction and solving

Sanchita Mondal, Debnarayan Khatua, Sourav Mandal et al.

Abstract Solving math word problems of varying complexities is one of the most challenging and exciting research questions in artificial intelligence (AI), particularly in natural language processing (NLP) and machine learning (ML). Foundational language models such as GPT must be evaluated for intelligence, and solving word problems is a key method for this assessment. These problems become especially difficult when presented in low-resource regional languages such as Bengali. Word problem solving integrates the cognitive domains of language processing, comprehension, and transformation into real-world solutions. During the past decade, advances in AI and machine learning have significantly progressed in addressing this complex issue. Although researchers worldwide have primarily utilized datasets in English and some in Chinese, there has been a lack of standard datasets for low-resource languages such as Bengali. In this pioneering study, we introduce the first Bengali Math Word Problem Benchmark Data Set (BMWP), comprising 8653 word problems. We detail the creation of this dataset and the benchmarking methods employed. Furthermore, we investigate operation prediction from Bengali word problems using state-of-the-art deep learning (DL) techniques. We implemented and compared various standard DL-based neural network architectures, achieving an accuracy of $$92 \pm 2\%$$ 92 ± 2 % . The data set and the code will be available at https://github.com/SanchitaMondal/BMWP .

Computational linguistics. Natural language processing, Electronic computers. Computer science

Detail DOI Sumber

DOAJ Open Access 2025

DE LA PACIFICATION AU DEVELOPPEMENT : UNE ANALYSE DE CONTENU DU ROLE DE RADIO OKAPI EN RDC (2002-2022)

Michel Kifinda NGOY

Résumé : Cette étude analyse le contenu des journaux de Radio Okapi de 2002 à 2022, afin de vérifier dans quelle mesure cette radio a contribué à la promotion de la paix, de la démocratie et du développement en RDC. L'analyse s'appuie sur le modèle de l'analyse de contenu catégorielle, qui permet de classer les informations en fonction de thématiques (paix, sécurité et actions humanitaires ; politique et démocratie ; socio-développement), des sources d'information (institutionnelles ou de proximité) et de la couverture territoriale. Un corpus de 300 conducteurs de journaux, sélectionnés de manière raisonnée, constitue le matériel d'analyse. Les résultats révèlent une évolution de l'offre éditoriale de Radio Okapi au fil du temps, avec une prédominance initiale des informations socio-développementales, une attention accrue aux enjeux politiques lors des périodes électorales, et une couverture territoriale qui s'est progressivement concentrée sur Kinshasa. Mots-clés : Radio Okapi, analyse de contenu, RDC, information, paix, développement.

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2024

Exploring Effective Reading Strategies: Enhancing Text Comprehension and Pedagogical Approaches in Diverse Learning Settings

Slimane LAKHDARI, Abdelkader MAKHLOUF & Wafa AL-QAISI

Abstract: This research explores the intricate realm of effective reading strategies seeking to understand the text better and pedagogical approaches across diverse learning settings. The research problem identifies complex efficacy in the diversity of reading strategies. Methodologically, it attempts to explore and explain contextual nuances that determine reading practices and their impacts on comprehension. Key areas in study discuss effectiveness of various strategies, assumptions with respect to reading strategies and comprehension. The effectiveness of the practice of reading is dependent on context, and contributions to the understanding of reading processes. The structure of the research goes largely through key findings, assumptions, contextual influences, and contributions, thus offering a broad perspective on including various domains of effective reading strategies. Yet, the study does acknowledge that there are certain in-built limitations in research, thereby reflecting some insight into the complexities and challenges that the researcher may have experienced throughout this study. Keywords: reading strategies, text comprehension, pedagogical approaches, contextual nuances, research synthesis.

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2024

Common Flaws in Running Human Evaluation Experiments in NLP

Craig Thomson, Ehud Reiter, Anya Belz

Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2024

Psychometric properties of the Bar-On emotional intelligence scale according to the modern theory - using the Rasch model

Sadaoui Meriem, Berghouti Tawfiq & Harrat Ali

Abstract: This study aimed to verify the validity of the emotional intelligence scale of (Bar-On) using the modern theory of measurement according to the One Parameter Rash model, which consists of 60 items. It was applied to a sample of (624) students from the University of Laghouat. The results showed that the scale has characteristics Acceptable psychometric, the coefficient of vocabulary stability was (0.99), while the stability of individuals' abilities was estimated at (0.78), as well as its enjoyment of an acceptable degree of validity according to the ability and difficulty parameters of the logit unit, and the grading of items differed using the model. Keywords: Psychometry, Modern theory, Rasch model, Emotional intelligence

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

S2 Open Access 2023

Utilizing Computational Linguistics Tools for Enhanced Poetic Interpretation

Emanuel Luo

This paper explores the functionalities of Python libraries, such as SpaCy and TextBlob, and pairs them with the ChatGPT-4 to undertake an analytical expedition into the realm of poetry. The study found that Python's tools adeptly handle tokenization, sentiment detection, and semantic analysis but are confined to analyzing specific text within the boundaries of their pre-trained models. In contrast, ChatGPT-4 merges advanced natural language processing (NLP) techniques with state-of-the-art machine learning paradigms, enabling a more comprehensive range of analyses covering thematic, imagery, and contextual dimensions. This fusion of traditional literary methodologies with advanced computational techniques illuminates the prospect of deciphering the nuanced linguistic constructs and profound thematic layers crafted by poets, thus offering layered insights into poetry. Acknowledging the restrictions posed by Python's modular libraries, our study was meticulous in the text selection. The paper analyzes an original poem by the author, performing NLP tasks such as sentiment analysis, emotion and tone detection, and dependency parsing using Python. In comparison, ChatGPT-4 showcases its prowess by not only capturing the surface-level natural imagery within the poem but also highlighting its underlying tribute to national pride and the contemplation it invites on the essence of freedom. The paper concludes by discussing the inherent limitations of Python-based NLP and ChatGPT-4 and suggests future research directions to bridge the gap between human intuition and technological innovation for deeper poetic insights.

3 sitasi en

Detail DOI Sumber

DOAJ Open Access 2023

A complementary integrated Transformer network for hyperspectral image classification

Diling Liao, Cuiping Shi, Liguo Wang

Abstract In the past, convolutional neural network (CNN) has become one of the most popular deep learning frameworks, and has been widely used in Hyperspectral image classification tasks. Convolution (Conv) in CNN uses filter weights to extract features in local receiving domain, and the weight parameters are shared globally, which more focus on the high‐frequency information of the image. Different from Conv, Transformer can obtain the long‐term dependence between long‐distance features through modelling, and adaptively focus on different regions. In addition, Transformer is considered as a low‐pass filter, which more focuses on the low‐frequency information of the image. Considering the complementary characteristics of Conv and Transformer, the two modes can be integrated for full feature extraction. In addition, the most important image features correspond to the discrimination region, while the secondary image features represent important but easily ignored regions, which are also conducive to the classification of HSIs. In this study, a complementary integrated Transformer network (CITNet) for hyperspectral image classification is proposed. Firstly, three‐dimensional convolution (Conv3D) and two‐dimensional convolution (Conv2D) are utilised to extract the shallow semantic information of the image. In order to enhance the secondary features, a channel Gaussian modulation attention module is proposed, which is embedded between Conv3D and Conv2D. This module can not only enhance secondary features, but suppress the most important and least important features. Then, considering the different and complementary characteristics of Conv and Transformer, a complementary integrated Transformer module is designed. Finally, through a large number of experiments, this study evaluates the classification performance of CITNet and several state‐of‐the‐art networks on five common datasets. The experimental results show that compared with these classification networks, CITNet can provide better classification performance.

Computational linguistics. Natural language processing, Computer software

Detail DOI Sumber

DOAJ Open Access 2023

La situación de la religión en La cena secreta de Javier Sierra Albert

Bourfané HAMAN ARMAND

Resumen: Nuestra investigación lleva sobre la situación de la religión en La cena secreta publicada en 2004. La obra de Javier Sierra Albert nos presenta la situación en la que se encuentra la fe católica. Según su obra, la religión católica se encuentra en una situación de decaimiento o más bien de descrecencia debido a muchos factores. Dentro de estos elementos que favorecieron el decaimiento de la Iglesia Católica, el autor destaca la pérdida de la facultad de interpretar imágenes, el advenimiento del racionalismo, la Grecia de Platón, el Egipto de Cleopatra, las extravagancias que vienen del Oriente, la presencia de turcos en la Mediterránea, la aparición de una horda de paganos y el Papa Alejandro que sostenía el paganismo. Según el autor de la obra, estos aspectos hicieron que la Iglesia católica y la cristiandad en general se encontraran en un contexto peor. Palabras claves: La situación de la religión, la fe católica, el decaimiento de valores bíblicos.

Arts in general, Computational linguistics. Natural language processing

Detail Sumber

DOAJ Open Access 2022

University Oriental College Lahore in Light of History and Culture

Imran Ali, Muhammad Ijaz Tabassum

<span style="font-size: 14.0pt; line-height: 115%; font-family: "Calibri","sans-serif"; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: Arial; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;">The Oriental College is first educational institution of Governemnt of Punjab, where Oriental Languages are taught. Oriental College was established in 1870, basically this Institution was established to acquaint the people with modern field of knowledge in Oriental Languages. At that time Arabic, Persian, Sanskrit, Hindi, Urdu, Punjabi (Grumukhi) and Peshto, including modern subjects of knowledge, like Engineering, Mathematic, Geography, Economic, Philosophy, Muslim Law, Dharam Shaster, Medicine, Economic, Tibb Unani, Vedak and History were taught in this College. </span><span style="font-size: 14.0pt; line-height: 115%; font-family: "Calibri","sans-serif"; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: Arial; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;">M.A. classes were started at Oriental College beginning with Arabic, M.A. Sanskrit 1888, Persian 1921, Urdu 1948, Punjabi 1970 and Kashmir Studies 1987.</span>

Language. Linguistic theory. Comparative grammar, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2022

Conceptual normalization of senar terminological neologisms

ASSANVO Amoikon Dyhie, COULIBALY Pezon Inza

Abstract: Terminologies’ Conceptual Normalization (P. I. Coulibaly, 2018, p. 429), a linguistic approach of terminology normalization within the theoretical framework of optimal terminology (P. I. Coulibaly, 2016) is based on the morphological (K. O. Yéo, 2012) and syntactic rules (A. D. Assanvo, 2010) of the language to be enriched with specific terminologies for a given terminological domain. In fact, the conceptual normalization is applied to the education domain in order to formalize into orthographic scripture the educational concepts’ denominations in senar language for teaching/learning senar language in Côte d’Ivoire modern educational system. The objective of Optimal Terminology Conceptual Normalization is the adaptation of the «mental lexicon» (I. Plag, 2003) collected from senar speaking community to the senar specialized lexicon of teaching/learning domain.

Arts in general, Computational linguistics. Natural language processing

Detail Sumber

DOAJ Open Access 2021

The Interaction of Knowledge Sources in Word Sense Disambiguation

Mark Stevenson, Yorick Wilks

Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2021

Word Sense Clustering and Clusterability

Diana McCarthy, Marianna Apidianaki, Katrin Erk

Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2021

Перекладні іншомовно-українські словники в історії української фразеографії

Володимир Чумак, Лариса Корнієнко

У статті проаналізовано специфіку укладання перекладних іншомовно-українських фразеологічних словників кінця ХХ – початку ХХІ століть, встановлено принципи їх структурної параметризації, з’ясовано характерні ознаки макроструктури й мікроструктури досліджуваних словників, визначено особливості систематизації реєстрових одиниць та засоби і способи відображення етнолінгвістичної інформації у фразеологічних працях перекладного типу, відзначено прикладні моменти використання перекладних фразеологічних словників. Інформація про авторів: Чумак Володимир Васильович – кандидат філологічних наук, професор, заступник директора з наукової роботи Українського мовно-інформаційного фонду НАН України (Україна). Електронна адреса: chumak@nas.gov.ua Корнієнко Лариса Миколаївна – кандидат філологічних наук, учитель англійської мови школи І–ІІІ ступенів N 233 Оболонського району м. Києва (Україна). Електронна адреса: larisa.kornienko@ukr.net __________ Література Амосова Н. Н. Об английских фразеологических словарях // Лексикографический сборник. Вып. VI. Москва : Изд-во АН СССР, 1963. С. 78–87. Баранцев К. Т. Курс лексикології сучасної англійської мови. Київ : Рад. школа, 1955. 255 с. Горецький П. Й. Історія української лексикографії. Київ : Вид-во АН УРСР, 1963. 241 с. Горецький П. Й. Методологічні принципи складання загальних двомовних перекладних словників // Лексикографічний бюлетень. Київ, 1951. Вип. І. С. 5–22. Краснобаєва-Чорна Ж. В., Боровик А. В. Сучасна українська фразеографія : Довідник. Донецьк : ДонНУ, 2011. 93 с. Москаленко А. А. Нарис історії української лексикографії. Київ : Рад. школа, 1961. 162 с. Прадід Ю. Ф. Історія української фразеографії // Мовознавство. 2012. № 1. С. 31–39. Скрипник Л. Г. Фразеологія української мови. Київ : Наукова думка, 1973. 280 с. Фразы, пословицы и приговорки Малороссїскїя / Уклад. О. П. Павловський // Грамматика малороссійскаго нарѣчія. Санкт-Петербург, 1818. С. 79–86.

Language. Linguistic theory. Comparative grammar, Computational linguistics. Natural language processing

Detail DOI Sumber

S2 Open Access 2020

Overview of methods for automatic natural language text processing

S. Belov, D. Zrelova, P. Zrelov et al.

This paper provides a brief overview of modern methods and approaches used for automatic processing of text information. In English-language literature, this area of science is called NLP-Natural Language Processing. The very name suggests that the subject of analysis (and for many tasks – and synthesis) are materials presented in one of the natural languages (and for a number of tasks – in several languages simultaneously), i.e. national languages of communication between people. Programming languages are not included in this group. In Russian-language literature, this area is called Computer (or mathematical) linguistics. NLP (computational linguistics) usually includes speech analysis along with text analysis, but in this review speech analysis does not consider. The review used materials from original works, monographs, and a number of articles published the «Open Systems.DBMS» journal.

15 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2020

Applications of Natural Language Processing in Bilingual Language Teaching: An Indonesian-English Case Study

Zara Maxwell-Smith, Simón González Ochoa, Ben Foley et al.

Multilingual corpora are difficult to compile and a classroom setting adds pedagogy to the mix of factors which make this data so rich and problematic to classify. In this paper, we set out methodological considerations of using automated speech recognition to build a corpus of teacher speech in an Indonesian language classroom. Our preliminary results (64% word error rate) suggest these tools have the potential to speed data collection in this context. We provide practical examples of our data structure, details of our piloted computer-assisted processes, and fine-grained error analysis. Our study is informed and directed by genuine research questions and discussion in both the education and computational linguistics fields. We highlight some of the benefits and risks of using these emerging technologies to analyze the complex work of language teachers and in education more generally.

8 sitasi en Computer Science

Detail DOI Sumber

Hasil untuk "Computational linguistics. Natural language processing"