Hasil "Computational linguistics. Natural language processing"

S2 Open Access 2016

Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification

P. Zhou, Wei Shi, Jun Tian et al.

Relation classification is an important semantic processing task in the field of natural language processing (NLP). State-ofthe-art systems still rely on lexical resources such as WordNet or NLP systems like dependency parser and named entity recognizers (NER) to get high-level features. Another challenge is that important information can appear at any position in the sentence. To tackle these problems, we propose Attention-Based Bidirectional Long Short-Term Memory Networks(AttBLSTM) to capture the most important semantic information in a sentence. The experimental results on the SemEval-2010 relation classification task show that our method outperforms most of the existing methods, with only word vectors.

1934 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2005

Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling

J. Finkel, Trond Grenager, Christopher D. Manning

Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sampling, a simple Monte Carlo method used to perform approximate inference in factored probabilistic models. By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference. We use this technique to augment an existing CRF-based information extraction system with long-distance dependency models, enforcing label consistency and extraction template consistency constraints. This technique results in an error reduction of up to 9% over state-of-the-art systems on two established information extraction tasks.

3452 sitasi en Computer Science

Detail DOI Sumber

DOAJ Open Access 2025

PERTINENCE OU DECLIN : IMPACT DU JOURNALISME NUMERIQUE SUR LA PRODUCTION MEDIATIQUE FACE AUX DEFIS DE L’IA

Fawzi CHERITI & Dalila MEHIRI

Abstract : The present study examines the intersection of current discussions regarding the transformative effects of digital journalism and the disruptive capabilities of AI-generated content. This investigation analyzes the informational strategies employed by digital media platforms and the different formats in which content and its topics are presented to the public in response to emerging challenges within an AI-dominated environment. A comprehensive dataset of 1086 publications is employed in this study from five leading global digital platforms: Business Insider, Huffington Post, TMZ, Gizmodo, and Mashable. This comparative content analysis explores the evolution of news messaging strategies by theorizing the engagement of these platforms with the pressures and benefits of AI integration. The results reveal important new perspectives on how journalism might adapt to technological disturbance. The diversity of content published across digital press platforms and a disparity with variance between the size, type, and methods of displaying content among these platforms. Text is the most common form of digital journalism, but photos and videos are increasingly highly regarded in the news industry. Our work presents a new way of presenting information through various forms, emphasizing the move towards more significant visual interaction without losing sight of the importance of text. Keywords: Digital journalism, AI challenges, news platforms, content creation, media outlets, digital newsrooms.

Arts in general, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2025

Quantifying extreme opinions on Reddit amidst the 2023 Israeli–Palestinian conflict

Alessio Guerra, Marcello Lepre, Oktay Karakuş

This study investigates the dynamics of extreme opinions on social media during the 2023 Israeli–Palestinian conflict, utilising a comprehensive dataset of over 450,000 posts from four Reddit subreddits (r/Palestine, r/Judaism, r/IsraelPalestine, and r/worldnews). A lexicon-based, unsupervised methodology was developed to measure “extreme opinions” by considering factors such as anger, polarity, and subjectivity. The analysis identifies significant peaks in extremism scores that correspond to pivotal real-life events, such as the IDF’s bombings of Al Quds Hospital and the Jabalia Refugee Camp, and the end of a ceasefire following a terrorist attack. Additionally, this study explores the distribution and correlation of these scores across different subreddits and over time, providing insights into the propagation of polarised sentiments in response to conflict events. By examining the quantitative effects of each score on extremism and analysing word cloud similarities through Jaccard indices, the research offers a nuanced understanding of the factors driving extreme online opinions. Our findings show that posts exhibiting extreme sentiment surged up to 80% (an increase of 0.3 in extremism score above the average of 0.405 at the end of October) during key conflict events. Compared to recent studies that have not explicitly quantified extremism in an unsupervised manner, we contribute to the literature by addressing this gap through a novel extremism score, derived from sentiment polarity, anger, and subjectivity, to analyse Reddit discourse surrounding the 2023 Israel–Palestine conflict. This approach captures the complex interplay between real-world events and online reactions, while acknowledging the inherent challenges of measuring extremism in dynamic social media environments. Our approach also enables scalable monitoring of public sentiment extremity, providing valuable insights for policymakers and conflict researchers.

Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2025

Bedil, Khān Ārzū and Mīr Taqī Mīr

Arsalan Ahmad Rathore

Bīdil is considered the greatest and last influential poet of the *Sabk-e-Hindi* (Indian style). The intellectual similarities between Ghālib and Bīdil are widely recognized as a valid research topic. However, little effort has been made to explore the connections between Mīr Taqī Mīr, a famous Urdu poet known as *Khudā-e-Sukhan* (God of Poetry), and Bīdil in terms of thought and style.   Even though Mīr lived closer to Bīdil’s time and had more direct access to his works compared to Ghālib, this connection has not been fully explored. Persian literary records clearly mention that Khān Ārzū, a mentor and relative of Mīr, considered himself a student of Bīdil. This paper is the first scholarly attempt to trace the intellectual and artistic links between Mīr and Bīdil through the influence of Khān Ārzū.

Language. Linguistic theory. Comparative grammar, Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2025

Tibetan Medical Named Entity Recognition Based on Syllable‐Word‐Sentence Embedding Transformer

Jin Zhang, Ziyue Zhang, Lobsang Yeshi et al.

ABSTRACT Tibetan medical named entity recognition (Tibetan MNER) involves extracting specific types of medical entities from unstructured Tibetan medical texts. Tibetan MNER provide important data support for the work related to Tibetan medicine. However, existing Tibetan MNER methods often struggle to comprehensively capture multi‐level semantic information, failing to sufficiently extract multi‐granularity features and effectively filter out irrelevant information, which ultimately impacts the accuracy of entity recognition. This paper proposes an improved embedding representation method called syllable–word–sentence embedding. By leveraging features at different granularities and using un‐scaled dot‐product attention to focus on key features for feature fusion, the syllable–word–sentence embedding is integrated into the transformer, enhancing the specificity and diversity of feature representations. The model leverages multi‐level and multi‐granularity semantic information, thereby improving the performance of Tibetan MNER. We evaluate our proposed model on datasets from various domains. The results indicate that the model effectively identified three types of entities in the Tibetan news dataset we constructed, achieving an F1 score of 93.59%, which represents an improvement of 1.24% compared to the vanilla FLAT. Additionally, results from the Tibetan medical dataset we developed show that it is effective in identifying five kinds of medical entities, with an F1 score of 71.39%, which is a 1.34% improvement over the vanilla FLAT.

Computational linguistics. Natural language processing, Computer software

Detail DOI Sumber

DOAJ Open Access 2025

Enhancing sugarcane leaf disease classification using vision transformers over CNNs

Saritha Miryala, Krupa Rasane

Abstract Sugarcane is a globally significant crop facing threats from leaf diseases that impact its productivity. Traditional detection methods are often inefficient and time-consuming. This study explores the use of Vision Transformers (ViT) for classifying sugarcane leaf diseases and compares their performance with traditional CNNs. A dataset of 19,926 images across six classes was used to fine-tune both ViT and CNN models. The optimized ViT model achieved a test accuracy of 96.53%, outperforming the CNN models (ResNet50 and VGG16) with accuracies of 91.92% and 92.30%, respectively. These findings demonstrate the superior performance of ViTs over CNNs in early disease detection for sustainable crop management. Future work will focus on expanding the dataset and optimizing model parameters for further improvements in disease classification accuracy.

Computational linguistics. Natural language processing, Electronic computers. Computer science

Detail DOI Sumber

CrossRef Open Access 2024

Incorporating Feature Interaction into Relation Contrastive Learning for Zero-shot Relation Extraction

Zhengxin Gao, Jianyong Duan, Xian Zhou

en

Detail DOI Sumber

CrossRef Open Access 2023

Kencorpus: A Kenyan Language Corpus of Swahili, Dholuo and Luhya for Natural Language Processing Tasks

Barack Wanjawa, Lilian Wanzare, Florence Indede et al.

Indigenous African languages are categorized as under-served in Natural Language Processing. They therefore experience poor digital inclusivity and information access. The processing challenge with such languages has been how to use machine learning and deep learning models without the requisite data. The Kencorpus project intends to bridge this gap by collecting and storing text and speech data that is good enough for data-driven solutions in applications such as machine translation, question answering and transcription in multilingual communities. The Kencorpus dataset is a text and speech corpus for three languages predominantly spoken in Kenya: Swahili, Dholuo and Luhya (three dialects of Lumarachi, Lulogooli and Lubukusu). Data collection was done by researchers who were deployed to the various data collection sources such as communities, schools, media, and publishers. The Kencorpus' dataset has a collection of 5,594 items, being 4,442 texts of 5.6 million words and 1,152 speech files worth 177 hours. Based on this data, other datasets were also developed such as Part of Speech tagging sets for Dholuo and the Luhya dialects of 50,000 and 93,000 words tagged respectively. We developed 7,537 Question-Answer pairs from 1,445 Swahili texts and also created a text translation set of 13,400 sentences from Dholuo and Luhya into Swahili. The datasets are useful for downstream machine learning tasks such as model training and translation. Additionally, we developed two proof of concept systems: for Kiswahili speech-to-text and a machine learning system for Question Answering task. These proofs provided results of a performance of 18.87% word error rate for the former, and 80% Exact Match (EM) for the latter system. These initial results give great promise to the usability of Kencorpus to the machine learning community. Kencorpus is one of few public domain corpora for these three low resource languages and forms a basis of learning and sharing experiences for similar works especially for low resource languages. Challenges in developing the corpus included deficiencies in the data sources, data cleaning challenges, relatively short project timelines and the Coronavirus disease (COVID-19) pandemic that restricted movement and hence the ability to get the data in a timely manner.

11 sitasi en

Detail DOI Sumber

DOAJ Open Access 2023

Role of Younas Qayasi in Evolution of Drama on Peshawar Television

Raj Muhammad Dr.Tahseen Bibi

Drama is a literary genre of fiction .It has dated back as the human history itself. It passed through various cultural boundaries. In Hindustan, various dramatists exhibited their skills and potential in it. When partition of Pakistan took place, the dramatists of KPK, especially showed great contributions. One of among them, Younas Qayasi is well known dramatist, who presented and staged not only Urdu but Pashto and Hindko dramas too. He earned great fame and name at national and International level.

Language. Linguistic theory. Comparative grammar, Computational linguistics. Natural language processing

Detail Sumber

S2 Open Access 2022

Closing the NLP Gap Documentary Linguistics and NLP Need a Shared Software Infrastructure

Luke Gessler

For decades, researchers in natural language processing and computational linguistics have been developing models and algorithms that aim to serve the needs of language documentation projects. However, these models have seen little use in language documentation despite their great potential for making documentary linguistic artefacts better and easier to produce. In this work, we argue that a major reason for this NLP gap is the lack of a strong foundation of application software which can on the one hand serve the complex needs of language documentation and on the other hand provide effortless integration with NLP models. We further present and describe a work-in-progress system we have developed to serve this need, Glam.

11 sitasi en

Detail DOI Sumber

DOAJ Open Access 2022

La poesia di Edith Bruck

Raul Mordenti

Computational linguistics. Natural language processing, Epistemology. Theory of knowledge

Detail DOI Sumber

DOAJ Open Access 2022

The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization

Ildikó Pilán, Pierre Lison, Lilja Øvrelid et al.

We present a novel benchmark and associated evaluation metrics for assessing the performance of text anonymization methods. Text anonymization, defined as the task of editing a text document to prevent the disclosure of personal information, currently suffers from a shortage of privacy-oriented annotated text resources, making it difficult to properly evaluate the level of privacy protection offered by various anonymization methods. This paper presents TAB (Text Anonymization Benchmark), a new, open-source annotated corpus developed to address this shortage. The corpus comprises 1,268 English-language court cases from the European Court of Human Rights (ECHR) enriched with comprehensive annotations about the personal information appearing in each document, including their semantic category, identifier type, confidential attributes, and co-reference relations. Compared with previous work, the TAB corpus is designed to go beyond traditional de-identification (which is limited to the detection of predefined semantic categories), and explicitly marks which text spans ought to be masked in order to conceal the identity of the person to be protected. Along with presenting the corpus and its annotation layers, we also propose a set of evaluation metrics that are specifically tailored toward measuring the performance of text anonymization, both in terms of privacy protection and utility preservation. We illustrate the use of the benchmark and the proposed metrics by assessing the empirical performance of several baseline text anonymization models. The full corpus along with its privacy-oriented annotation guidelines, evaluation scripts, and baseline models are available on: https://github.com/NorskRegnesentral/text-anonymization-benchmark.

Computational linguistics. Natural language processing

Detail DOI Sumber

S2 Open Access 2021

Out-of-vocabulary but not meaningless: Evidence for semantic-priming effects in pseudoword processing.

Daniele Gatti, M. Marelli, Luca Rinaldi

Nonarbitrary phenomena in language, such as systematic association in the form-meaning interface, have been widely reported in the literature. Exploiting such systematic associations previous studies have demonstrated that pseudowords can be indicative of meaning. However, whether semantic activation from words and pseudowords is supported by the very same processes, activating a common semantic memory system, is currently not known. Here, we take advantage of recent progresses from computational linguistics models allowing to induce meaning representations for out-of-vocabulary strings of letters via domain-general associative-learning mechanisms applied to natural language. We combined these models with data from priming tasks, in which participants are showed two strings of letters presented sequentially one after the other and are then asked to indicate if the latter is a word or a pseudoword. In Experiment 1 we reanalyzed the data of the largest behavioral database on semantic priming, while in Experiment 2 we ran an independent replication on a new language, Italian, controlling for a series of possible confounds. Results were consistent across the two experiments and showed that the prime-word meaning interferes with the semantic pattern elicited by the target pseudoword (i.e., at increasing estimated semantic relatedness between prime word and target pseudoword, participants' reaction times increased and accuracy decreased). These findings indicate that the same associative mechanisms governing word meaning also subserve the processing of pseudowords, suggesting in turn that human semantic memory can be conceived as a distributional system that builds upon a general-purpose capacity of extracting knowledge from complex statistical patterns. (PsycInfo Database Record (c) 2022 APA, all rights reserved).

31 sitasi en Medicine

Detail DOI Sumber

DOAJ Open Access 2021

Translating English WOMAN IS AN ANIMAL metaphors: Spanish native speakers’ associations with novel metaphors

Kristina Fernandes

Animal metaphors are prevalent across languages and convey a variety of, oftentimes negative, meanings – more so for women than men. In English, for example, both lion and lioness refer to a sexually active, dominant man or woman respectively, but while the former is endowed with positive connotations (courage, strength), the latter evokes negative associations (danger, voracity). There are some animal terms, however, that do not feature in animal metaphors in a certain language, posing the question as to which associations are evoked by those animal terms that are not part of conventional animal metaphors. This paper explores Spanish speakers’ interpretations of mappings of the woman is an animal metaphor that are documented to exist in English but not in Spanish. This was tested with two online questionnaires, one employing open questions and the other one Likert scales presenting possible traits (e. g. quarrelsome, kind, promiscuous), in which Spanish speakers had to judge the animal metaphors which were translated from English. The results show that the novel animal metaphors are mainly associated by Spanish native speakers with negative features, first and foremost with ugliness. Additionally, most of the animal terms convey different meanings in English and Spanish. For example, musaraña, the Spanish equivalent of shrew, is not associated with bad temper and quarrelling, but instead with ugliness and muddleheadedness. Furthermore, the findings reveal significant insecurities in the interpretation of the translated metaphors by the Spanish speakers. These results might be an indication for both the arbitrariness and the stableness of associations with different animal species, depending on the speakers’ culture. It also seems that novel animal metaphors mainly provide mental access to unattractiveness as it is a concrete physical feature and might therefore be more accessible than abstract personality traits such as kindness or quarrelsomeness.

Computational linguistics. Natural language processing, Language. Linguistic theory. Comparative grammar

Detail DOI Sumber

DOAJ Open Access 2021

Learning Methods to Combine Linguistic Indicators: Improving Aspectual Classification and Revealing Linguistic Insights

Eric V. Siegel, Kathleen R. McKeown

Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2021

J-NERD: Joint Named Entity Recognition and Disambiguation with Rich Linguistic Features

Dat Ba Nguyen, Martin Theobald, Gerhard Weikum

Computational linguistics. Natural language processing

Detail DOI Sumber

DOAJ Open Access 2020

In memoria del poeta Claude Vigée

Claude Cazalé Bérard

Il 2 ottobre, nella vigilia di Sukkot, festa ebraica che ricorda la fragilità della nostra dimora terrena, è mancato Claude Vigée, il grande poeta francese, ebreo alsaziano, che Testo e Senso ha avuto l’onore di ospitare nella sue pagine, alcuni anni fa, in quanto poeta, saggista e traduttore, nell’occasione di una tavola rotonda dedicata a Traduzione ed Etica1. Per lui, infatti, la parola poetica è stata sempre integralmente “poéthique” (poetica ed etica insieme): l’etica contro il nichilismo, e contro i dogmi del pensiero unico, contro la disgregazione, la disintegrazione, la distruzione dell’umano nell’uomo, contro il ritorno dei mostri e della barbarie; una lezione di vita, la sua, una lotta per la vita, in nome della vita, secondo l’imperativo della Scrittura: «Io ho posto davanti a voi la vita e la morte, la benedizione e la maledizione; scegli la vita, onde viviate tu e la tua discendenza» (Deuteronomio, 30 v. 19).

Computational linguistics. Natural language processing, Epistemology. Theory of knowledge

Detail Sumber

S2 Open Access 1997

Finite-State Language Processing

Emmanuel Roche, Yves Shabes

461 sitasi en Computer Science

Detail DOI Sumber

DOAJ Open Access 2019

Сугестія евфонії української поезії

Олександр Строкаль

Дослідження М. Ярмолінської присвячене вивченню поетичної евфонії як однієї з характерних рис української поезії. Дослідниця розуміє евфонію як науку, що аналізує звуковий склад віршів та встановлює закони поетичної милозвучності. Автор застосовує загальні лінгвістичні принципи потрактування явища евфонії і наголошує на важливості її у тексті та на виконуваних нею функціях. Дослідниця наводить тлумачення таких понять, як “фонематична насиченість тексту”, “текстуальна акцентована вокалізація”, “текстуальна консонантизація” та ін., розкриває особливості їхнього взаємозв’язку в межах евфонії як науки. М. Ярмолінська стверджує, що закон звукової семантики максимально реалізується саме в атипових, неочікуваних, свідомо створюваних акустичних контекстах. Автор пропонує низку параметрів, за якими, на її думку, доцільно здійснювати аналіз евфонічних явищ. Зокрема, це: інтонаційні нюанси; орнаментальна функція; функція «звукового курсиву»; асоціативні зв’язки слів; створення специфічного звучання; звукопис; створення символічного звучання; семантичні ефекти. Одним із евфонічних засобів, який, на думку дослідниці, виконує сугестивну функцію, є алітерація. Цей прийом має давню історію і тому характеризується тісним зв’язком із магією. М. Ярмолінська розглядає особливості застосування прийому алітерації у поетичних текстах Т. Шевченко, П. Тичина, О. Довгий, А. Мойсієнко, Д. Чистяк. Що стосується власне поетичних доробків розглядуваних авторів, то зауважимо. що запропонований дослідницею добір персоналій (Т. Шевченко, П. Тичина, О. Довгий, А. Мойсієнко, Д. Чистяк), хоча і не претендує на вичерпність, проте є досить показовим, оскільки дозволяє лінгвісту зробити аналітичний зріз на кожному з етапів розвитку української мови. Інформація про автора: Строкаль Олександр Миколайович – кандидат філологічних наук, асистент кафедри української мови та прикладної лінгвістики Інституту філології Київського національного університету імені Тараса Шевченка (Україна). Електронна адреса: omstrokal@gmail.com __________ Рецензія на: Ярмолінська М. В. Милозвучність поетичної мови: Тарас Шевченко, Павло Тичина, Олексій Довгий, Анатолій Мойсієнко, Дмитро Чистяк / За ред. Ю. Л. Мосенкіса. Київ : Аратта, 2018. 80 с.

Language. Linguistic theory. Comparative grammar, Computational linguistics. Natural language processing

Detail DOI Sumber

Hasil untuk "Computational linguistics. Natural language processing"