Hasil "Computational linguistics. Natural language processing"

DOAJ Open Access 2025

THE BRITISH LABOUR PARTY’S IDEOLOGICAL PIVOT UNDER TONY BLAIR: FROM RADICAL AND DEMOCRATIC SOCIALISM TO SOCIAL DEMOCRACY

Djihed MESSIKH

Abstract: Due to global economic challenges, the long-term impacts of Thatcherism rooted in neoliberalism, and internal divisions within the party, a ‘New Labour’ was born under Tony Blair’s Third Way pragmatic vision, marking a decisive break from traditional policies, in which new market-oriented reforms and welfare-to-work programs were embraced simultaneously. This paper studies the British Labour Party’s ideological shift from radical and democratic socialism to social democracy during the premiership of Tony Blair. The study starts by examining briefly Labour’s socialist origins with a particular focus on Clause IV and the influence of Marxism and the Fabian Society. Then, it covers the implementation of democratic socialist policies in the post-war era under Clement Attlee and how they were challenged gradually by Margaret Thatcher and the rise of Tony Blair. By analyzing this reorientation, the article highlights the lasting impact of Blair’s leadership on the party’s political identity, offering valuable insights into the broader dynamics of political ideology, electoral strategy, and the future of radical socialism movements worldwide. Keywords: British Labour Party, Radical Socialism, Democratic Socialism, Social Democracy, Tony Blair, Nationalization, Electoral Pragmatism. ThirdWay

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2025

Optimization of English translation model combining deep learning and attention mechanism and its application in cross-cultural communication

Shengqin Bi

Abstract In today’s globalized context, cross-cultural communication is becoming increasingly frequent, and the importance of English translation is becoming more prominent. Although the traditional Transformer architecture achieves global relevance capture through self-attention, its O (n) computational complexity leads to a significant semantic gap in long text translation (for example, the error rate of complex sentence structure reaches 37%), and the translation accuracy of culture-loaded words is insufficient (78.4%). In this study, a DAT-NMT model integrating dual attention mechanism and confrontation training is proposed. Through the dynamic adjustment of word-sentence hierarchical attention (BLEU reaches 45.2 when τ = 0.5) and explicit coding of cultural characteristics, the double breakthrough of semantic preservation and cultural adaptation is realized. Experiments show that the model is improved by 23.8% compared with the baseline on WMT2020 data set, and the accuracy rate of culturally sensitive words is 92.3%, which provides a new paradigm for cross-cultural translation with both efficiency and accuracy. In cross-cultural communication scenarios, DAT-NMT also performs well, better handling texts rich in cultural connotations, reducing translation errors caused by cultural differences, providing more reliable translation support for cross-cultural communication, and opening up new technical directions for the development of the English translation field.

Computational linguistics. Natural language processing, Electronic computers. Computer science

Detail DOI Sumber

DOAJ Open Access 2025

Machine learning for classifying affective valence from fMRI: a systematic review and meta-analysis

Charith Chitraranjan, Ruwan Dayananda, Dakshitha Suriyaaratchie et al.

Abstract The pleasantness or unpleasantness of psychological states, known as hedonic valence is considered a fundamental dimension of emotional experiences. Many studies have applied machine learning techniques to predict valence from fMRI data and reported varying levels of accuracy. In this work, we systematically review studies published up to October 2023 that have applied machine learning as a multi-variate pattern analysis approach to classify valence from fMRI trials of healthy adults. In each trial, a participant was presented with a stimulus expected to induce positive or negative valence. Our objectives were to (1) review and summarize selected studies based on attributes such as experimental design and task (complete list of attributes is provided in the text); (2) summarize the accuracy of valence classification; and (3) investigate how the accuracy of valence prediction is influenced by the experimental paradigm. We searched the databases Scopus, Pubmed, IEEEXplore and ACM Digital Library to retrieve relevant studies. Twenty-three studies met the eligibility criteria and were included in the review. We performed a meta-analysis involving 30 observations from 22 of those studies. The meta-analytic summary of the accuracy for classifying positive vs. negative valence was significantly above chance level. Further analysis showed that studies adopting a block-design achieve significantly higher classification accuracy than those adopting an event-related design. Based on our experiments comparing popular machine learning models across two datasets, we recommend logistic regression for its simplicity, interpretability, and comparable accuracy to more complex models. However, we suggest that future studies also explore deep learning architectures such as convolutional and graph neural networks, which have not yet been applied to classify valence from fMRI data.

Computational linguistics. Natural language processing, Electronic computers. Computer science

Detail DOI Sumber

DOAJ Open Access 2025

Augmented human intelligence goes beyond the AGI mirage

Michael Lissack, Brenden Meagher

Abstract This paper argues that contemporary Artificial General Intelligence (AGI) approaches face significant challenges that make Augmented Human Intelligence (AHI) a more promising and practically beneficial alternative. Drawing on phenomenological insights from Peter-Paul Verbeek’s technological mediation theory and comprehensive empirical evidence from decades of AI development, we argue that current AGI projects encounter recurring difficulties that may stem from attempting to replace rather than enhance human intelligence. Through analysis of mediation theory’s core concepts, systematic examination of recent AI performance data, and comprehensive review of human–AI collaboration studies, we show why augmentation approaches consistently outperform replacement attempts across domains requiring creativity, judgment, and contextual understanding. Prioritizing sophisticated AHI development over AGI pursuit represents not a compromise but a fundamental reorientation toward more effective and ethically sound approaches to artificial intelligence that better serve human flourishing.

Computational linguistics. Natural language processing, Electronic computers. Computer science

Detail DOI Sumber

arXiv Open Access 2025

TEDxTN: A Three-way Speech Translation Corpus for Code-Switched Tunisian Arabic - English

Fethi Bougares, Salima Mdhaffar, Haroun Elleuch et al.

In this paper, we introduce TEDxTN, the first publicly available Tunisian Arabic to English speech translation dataset. This work is in line with the ongoing effort to mitigate the data scarcity obstacle for a number of Arabic dialects. We collected, segmented, transcribed and translated 108 TEDx talks following our internally developed annotations guidelines. The collected talks represent 25 hours of speech with code-switching that cover speakers with various accents from over 11 different regions of Tunisia. We make the annotation guidelines and corpus publicly available. This will enable the extension of TEDxTN to new talks as they become available. We also report results for strong baseline systems of Speech Recognition and Speech Translation using multiple pre-trained and fine-tuned end-to-end models. This corpus is the first open source and publicly available speech translation corpus of Code-Switching Tunisian dialect. We believe that this is a valuable resource that can motivate and facilitate further research on the natural language processing of Tunisian Dialect.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2025

Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation

Priyaranjan Pattnayak, Amit Agarwal, Hansa Meghwani et al.

Retrieval-Augmented Generation (RAG) systems and large language model (LLM)-powered chatbots have significantly advanced conversational AI by combining generative capabilities with external knowledge retrieval. Despite their success, enterprise-scale deployments face critical challenges, including diverse user queries, high latency, hallucinations, and difficulty integrating frequently updated domain-specific knowledge. This paper introduces a novel hybrid framework that integrates RAG with intent-based canned responses, leveraging predefined high-confidence responses for efficiency while dynamically routing complex or ambiguous queries to the RAG pipeline. Our framework employs a dialogue context manager to ensure coherence in multi-turn interactions and incorporates a feedback loop to refine intents, dynamically adjust confidence thresholds, and expand response coverage over time. Experimental results demonstrate that the proposed framework achieves a balance of high accuracy (95\%) and low latency (180ms), outperforming RAG and intent-based systems across diverse query types, positioning it as a scalable and adaptive solution for enterprise conversational AI applications.

en cs.AI

Detail Sumber

arXiv Open Access 2025

Probing Large Language Models in Reasoning and Translating Complex Linguistic Puzzles

Zheng-Lin Lin, Yu-Fei Shih, Shu-Kai Hsieh

This paper investigates the utilization of Large Language Models (LLMs) for solving complex linguistic puzzles, a domain requiring advanced reasoning and adept translation capabilities akin to human cognitive processes. We explore specific prompting techniques designed to enhance ability of LLMs to reason and elucidate their decision-making pathways, with a focus on Input-Output Prompting (IO), Chain-of-Thought Prompting (CoT), and Solo Performance Prompting (SPP). Utilizing datasets from the Puzzling Machine Competition and various Linguistics Olympiads, we employ a comprehensive set of metrics to assess the performance of GPT-4 0603, a prominent LLM, across these prompting methods. Our findings illuminate the potential of LLMs in linguistic reasoning and complex translation tasks, highlighting their capabilities and identifying limitations in the context of linguistic puzzles. This research contributes significantly to the broader field of Natural Language Processing (NLP) by providing insights into the optimization of LLM applications for improved reasoning and translation accuracy, thereby enriching the ongoing dialogue in NLP advancements.

en cs.CL

Detail Sumber

S2 Open Access 2024

Pashto poetry generation: deep learning with pre-trained transformers for low-resource languages

Imran Ullah, Khalil Ullah, Hamad Khan et al.

Generating poetry using machine and deep learning techniques has been a challenging and exciting topic of research in recent years. It has significance in natural language processing and computational linguistics. This study introduces an innovative approach to generate high-quality Pashto poetry by leveraging two pre-trained transformer models, LaMini-Cerebras-590M and bloomz-560m. The models were trained on an extensive new and quality Pashto poetry dataset to learn the underlying complex patterns and structures. The trained models are then used to generate new Pashto poetry by providing them with a seed text or prompt. To evaluate the quality of the generated poetry, we conducted both subjective and objective evaluations, including human evaluation. The experimental results demonstrate that the proposed approach can generate Pashto poetry that is comparable in quality to human-generated poetry. The study provides a valuable contribution to the field of Pashto language and poetry generation and has potential applications in natural language processing and computational linguistics.

2 sitasi en Computer Science, Medicine

Detail DOI Sumber

DOAJ Open Access 2024

Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets

Nikita Moghe, Arnisa Fazla, Chantal Amrhein et al.

Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2024

Die Relevanz von Kollokationen und ihre Vermittlung im Fremdsprachenunterricht

Erzsébet Pintye-Lukács

One of the most important goals of foreign language teaching is to enable students to use the target language in everyday situations. Developing lexical competence in foreign language learning can play an important role in achieving this goal. The aim of the present study is to call attention to the importance of teaching collocations in foreign language education. The first part of the paper gives an overview about collocations and collocational competence. In the second part of the paper, I would like to focus on principles of phraseology and foreign language teaching methodology that can be used for teaching collocations in the classroom.

Computational linguistics. Natural language processing, Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

arXiv Open Access 2024

Mapping 'when'-clauses in Latin American and Caribbean languages: an experiment in subtoken-based typology

Nilo Pedrazzini

Languages can encode temporal subordination lexically, via subordinating conjunctions, and morphologically, by marking the relation on the predicate. Systematic cross-linguistic variation among the former can be studied using well-established token-based typological approaches to token-aligned parallel corpora. Variation among different morphological means is instead much harder to tackle and therefore more poorly understood, despite being predominant in several language groups. This paper explores variation in the expression of generic temporal subordination ('when'-clauses) among the languages of Latin America and the Caribbean, where morphological marking is particularly common. It presents probabilistic semantic maps computed on the basis of the languages of the region, thus avoiding bias towards the many world's languages that exclusively use lexified connectors, incorporating associations between character $n$-grams and English $when$. The approach allows capturing morphological clause-linkage devices in addition to lexified connectors, paving the way for larger-scale, strategy-agnostic analyses of typological variation in temporal subordination.

en cs.CL, cs.IR

Detail DOI Sumber

arXiv Open Access 2024

The Monte Carlo Computational Summit -- October 25 & 26, 2023 -- Notre Dame, Indiana, USA

Joanna Piper Morgan, Alexander Mote, Samuel Lee Pasmann et al.

The Monte Carlo Computational Summit was held on the campus of the University of Notre Dame in South Bend, Indiana, USA on 25--26 October 2023. The goals of the summit were to discuss algorithmic and software alterations required for successfully porting respective code bases to exascale-class computing hardware, compare software engineering techniques used by various code teams, and consider the adoption of industry-standard benchmark problems to better facilitate code-to-code performance comparisons. A large portion of the meeting included candid discussions of direct experiences with approaches that have and have not worked. Participants reported that identifying and implementing suitable Monte Carlo algorithms for GPUs continues to be a sticking point. They also report significant difficulty porting existing algorithms between GPU APIs (specifically Nvidia CUDA to AMD ROCm). To better compare code-to-code performance, participants decided to design a C5G7-like benchmark problem with a defined figure of merit, with the expectation of adding more benchmarks in the future. Problem specifications and results will eventually be hosted in a public repository and will be open to submissions by all Monte Carlo transport codes capable of running the benchmark problem. The participants also identified the need to explore the intermediate and long-term future of the Monte Carlo neutron transport community and how best to modernize and contextualize Monte Carlo as a useful tool in modern industry. Overall the summit was considered to be a success by the organizers and participants, and the group shared a strong desire for future, potentially larger, Monte Carlo summits.

en physics.comp-ph

Detail DOI Sumber

S2 Open Access 2023

Optimizing sentiment analysis of Nigerian 2023 presidential election using two-stage residual long short term memory.

D. Oyewola, L. Oladimeji, Sowore Olatunji Julius et al.

Sentiment analysis is the process of recognizing positive or negative attitudes in text. This technique makes use of computational linguistics, text analysis, and natural language processing. The 2023 presidential election in Nigeria is a significant event for the country, as it will determine the leader of the nation for the next four years. As such, it is important to understand the sentiment of the public towards the different candidates. In this research, we aimed to understand the sentiment of the public towards the three main candidates in the 2023 presidential election in Nigeria, Atiku, Tinubu, and Obi, by conducting a sentiment analysis on tweets related to the candidates. We used the long short-term memory (LSTM), peephole long short term memory (PLSTM), and two-stage residual long short-term memory (TSRLSTM) models to classify tweets as positive, neutral, or negative. Our dataset consisted of a large number of tweets that were preprocessed to remove noise and irrelevant information. Results showed that TSRLSTM performed excellently well in classifying the tweets and in identifying the sentiment towards each candidate individually. Our findings provide valuable insights into the public's opinion on the candidates and their campaign strategies, which can be useful for researchers, political analysts, and decision-makers. Our study highlights the importance of sentiment analysis in understanding public opinion and its potential applications in the field of political science.

18 sitasi en Medicine

Detail DOI Sumber

S2 Open Access 2023

Toward Disambiguating the Definitions of Abusive, Offensive, Toxic, and Uncivil Comments

Pia Pachinger, A. Hanbury, J. Neidhardt et al.

The definitions of abusive, offensive, toxic and uncivil comments used for annotating corpora for automated content moderation are highly intersected and researchers call for their disambiguation. We summarize the definitions of these terms as they appear in 23 papers across different fields. We compare examples given for uncivil, offensive, and toxic comments, attempting to foster more unified scientific resources. Additionally, we stress that the term incivility that frequently appears in social science literature has hardly been mentioned in the literature we analyzed that focuses on computational linguistics and natural language processing.

16 sitasi en

Detail DOI Sumber

S2 Open Access 2023

Multimodal sentiment system and method based on CRNN-SVM

Yuxia Zhao, Mahpirat Mamat, A. Aysa et al.

Traditional sentiment analysis focuses on text-level sentiment mining, transforming sentiment mining into classification or regression problems, resulting in a sentiment analysis low accuracy rate. Sentiment analysis refers to the use of natural language processing, text analysis, and computational linguistics to systematically identify, extract, quantify, and study sentimental states. Therefore, more scholars have begun to focus on speech recognition and facial expression recognition research, and extracting and analysing people’s sentiment tendencies can improve sentiment recognition accuracy. Traditional single-modal sentiment analysis can no longer meet people’s needs. Therefore, this paper proposes a multimodal sentiment analysis method based on the multimodal sentiment analysis method that can obtain more sentimental information sources and help people make better decisions. The experimental results in this paper show that the highest recognition rates of CNN-SVM, RNN-SVM, and CRNN-SVM were 76.8%, 71.2%, and 93.5%, respectively. It can be seen that CRNN-SVM has the highest sentiment tendency recognition rate in deep learning, so it is suitable to apply CRNN-SVM to sentiment tendency analysis system design in this paper. The average accuracy rate of the system designed in this paper was 91%, and the stability was also very strong, which shows that the system designed in this paper is meaningful. The main contribution of this paper is based on the limitations of single-mode emotion analysis. It proposes a multimode emotion analysis method and introduces a convolutional neural network to help people obtain more emotional information sources to meet their needs.

15 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2023

Automatic genre identification: a survey

Taja Kuzman, Nikola Ljubešić

Automatic genre identification (AGI) is a text classification task focused on genres, i.e., text categories defined by the author’s purpose, common function of the text, and the text’s conventional form. Obtaining genre information has been shown to be beneficial for a wide range of disciplines, including linguistics, corpus linguistics, computational linguistics, natural language processing, information retrieval and information security. Consequently, in the past 20 years, numerous researchers have collected genre datasets with the aim to develop an efficient genre classifier. However, their approaches to the definition of genre schemata, data collection and manual annotation vary substantially, resulting in significantly different datasets. As most AGI experiments are dataset-dependent, a sufficient understanding of the differences between the available genre datasets is of great importance for the researchers venturing into this area. In this paper, we present a detailed overview of different approaches to each of the steps of the AGI task, from the definition of the genre concept and the genre schema, to the dataset collection and annotation methods, and, finally, to machine learning strategies. Special focus is dedicated to the description of the most relevant genre schemata and datasets, and details on the availability of all of the datasets are provided. In addition, the paper presents the recent advances in machine learning approaches to automatic genre identification, and concludes with proposing the directions towards developing a stable multilingual genre classifier.

15 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2023

Sentiment analysis model for Airline customers’ feedback using deep learning techniques

Hebatullah Samir, Laila A. Abd-Elmegid, Mohamed I. Marie

Sentiment analysis (SA) has recently developed an automated approach for assessing sentiment, emotion, and these reviews or opinions to extract relevant and subjective information from text-based data. Analyzing sentiment on social networks, such as Twitter, has become a powerful means of learning about the users’ opinions and better understanding and satisfaction. However, it consumes time and energy to disperse and collect surveys from clients, often inaccurate and inconsistent, and evaluating and improving the accuracy of the methods in sentiment analysis is being hindered by the challenges encountered in Natural Language Processing (NLP). This paper uses NLP, text analysis, biometrics, and computational linguistics to detect and extract replies, moods, or emotions from Skytrax Airline Customers' Feedback SACF data. This research uses deep learning models to analyze various approaches applied to small SACF to solve sentiment analysis problems. We applied word embedding (Glove embedding models) to improve the sentiment classification performance of a series of datasets extensively utilized for feature extractions. Finally, a comparative study has been conducted on the SACF data analysis utilizing deep learning (DL)for evaluating the performance of the different models and input features, which is Recurrent Neural Networks (RNN), long short-term memory (LSTM), Gated Recurrent Unit (GRU), 1D Convolutional Neural Networks (CONV1D), and Bidirectional Encoder Representations from Transformers (BERT) for application to big datasets in 2019. This approach was assessed using each classification technique; the precision, recall, f1-score, and accuracy metrics for sentiment analysis have been identified. And The results show that LSTM outperforms in classification accuracy; the outcome was 91%.

15 sitasi en

Detail DOI Sumber

S2 Open Access 2021

Obituary: Martin Kay

R. Kaplan, H. Uszkoreit

It is with great sadness that we report the passing of Martin Kay in August 2021. Martin was a pioneer and intellectual trailblazer in computational linguistics. He was also a close friend and colleague of many years. Martin was a polyglot undergraduate student of modern and medieval languages at Cambridge University, with a particular interest in translation. He was not (yet) a mathematician or engineer, but idle speculation in 1958 about the possibilities of automating the translation process led him to Margaret Masterman at the Cambridge Language Research Unit, and a shift to a long and productive career. In 1960 he was offered an internship with Dave Hays and the Linguistics Project at The RAND Corporation in California, another early center of research in our emerging discipline. He stayed at RAND for more than a decade, working on basic technologies that are needed for machine processing of natural language. Among his contributions during that period was the development of the first so-called chart parser (Kay 1967), a computationally effective mechanism for dealing systematically with linguistic dependencies that cannot be expressed in context-free grammars. The chart architecture could be deployed for language generation as well as parsing, an important property for Martin’s continuing interest in translation. It was during the years at RAND that Martin found his second calling, as a teacher of computational linguistics, initially at UCLA and then in many other settings. He was a gifted and entertaining speaker and lecturer, able to present complex material with clarity and precision. He took great pleasure in the interactions with his students and the role that he played in helping to advance their careers. He left RAND in 1972 to become a full-time professor and chair of the Computer Science Department at the University of California at Irvine. His time at Irvine was short-lived, as he was attracted back to an open-ended research environment. In 1974 he joined with Danny Bobrow, Ron Kaplan, and Terry Winograd to form the Language Understander project at the recently created Palo Alto Research Center (PARC) of the Xerox Corporation. The group took as a first goal the construction of a mixed-initiative dialog system using state-of-the-art components for knowledge representation and reasoning, language understanding, language production, and dialog management (Bobrow et al. 1977). Martin took responsibility for

78 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2022

Stance-level Sarcasm Detection with BERT and Stance-centered Graph Attention Networks

Yazhou Zhang, Dan Ma, P. Tiwari et al.

Computational Linguistics (CL) associated with the Internet of Multimedia Things (IoMT)-enabled multimedia computing applications brings several research challenges, such as real-time speech understanding, deep fake video detection, emotion recognition, home automation, and so on. Due to the emergence of machine translation, CL solutions have increased tremendously for different natural language processing (NLP) applications. Nowadays, NLP-enabled IoMT is essential for its success. Sarcasm detection, a recently emerging artificial intelligence (AI) and NLP task, aims at discovering sarcastic, ironic, and metaphoric information implied in texts that are generated in the IoMT. It has drawn much attention from the AI and IoMT research community. The advance of sarcasm detection and NLP techniques will provide a cost-effective, intelligent way to work together with machine devices and high-level human-to-device interactions. However, existing sarcasm detection approaches neglect the hidden stance behind texts, thus insufficient to exploit the full potential of the task. Indeed, the stance, i.e., whether the author of a text is in favor of, against, or neutral toward the proposition or target talked in the text, largely determines the text’s actual sarcasm orientation. To fill the gap, in this research, we propose a new task: stance-level sarcasm detection (SLSD), where the goal is to uncover the author’s latent stance and based on it to identify the sarcasm polarity expressed in the text. We then propose an integral framework, which consists of Bidirectional Encoder Representations from Transformers (BERT) and a novel stance-centered graph attention networks (SCGAT). Specifically, BERT is used to capture the sentence representation, and SCGAT is designed to capture the stance information on specific target. Extensive experiments are conducted on a Chinese sarcasm sentiment dataset we created and the SemEval-2018 Task 3 English sarcasm dataset. The experimental results prove the effectiveness of the SCGAT framework over state-of-the-art baselines by a large margin.

37 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2021

Ungoliant: An Optimized Pipeline for the Generation of a Very Large-Scale Multilingual Web Corpus

Julien Abadji, Pedro Ortiz Suarez, Laurent Romary et al.

Since the introduction of large language models in Natural Language Processing, large raw corpora have played a crucial role in Computational Linguistics. However, most of these large raw corpora are either available only for English or not available to the general public due to copyright issues. Nevertheless, there are some examples of freely available multilingual corpora for training Deep Learning NLP models, such as the OSCAR and Paracrawl corpora. However, they have quality issues, especially for low-resource languages. Moreover, recreating or updating these corpora is very complex. In this work, we try to reproduce and improve the goclassy pipeline used to create the OSCAR corpus. We propose a new pipeline that is faster, modular, parameterizable, and well documented. We use it to create a corpus similar to OSCAR but larger and based on recent data. Also, unlike OSCAR, the metadata information is at the document level. We release our pipeline under an open source license and publish the corpus under a research-only license.

67 sitasi en Computer Science

Detail DOI Sumber

Hasil untuk "Computational linguistics. Natural language processing"