Semantic Scholar Open Access 2025

Fine-Grained Arabic Offensive Language Classification with Taxonomy, Sentiment, and Emotions

N. Vanetik Marina Litvak Chaya Liebeskind

Abstrak

Offensive language detection in Arabic is a challenging task because of the unique linguistic and cultural characteristics of the Arabic language. This study introduces a high-quality annotated dataset for classifying offensive language in Arabic, based on a structured taxonomy, categorizing offensive content across seven levels, capturing both explicit and implicit expressions. Utilizing this taxonomy, we re-annotate the FARAD-500 dataset, creating reFarad-500, which provides fine-grained labels for offensive texts in Arabic. A thorough dataset analysis reveals key patterns in offensive language distribution, emphasizing the importance of target type, offense severity, and linguistic structures. Additionally, we assess text classification techniques to evaluate the dataset’s effectiveness, exploring the impact of sentiment analysis and emotion detection on classification performance. Our findings highlight the complexity of Arabic offensive language and underscore the necessity of extensive annotation frameworks for accurate detection. This paper advances Arabic nat-ural language processing (NLP) in resource-constrained settings by enhancing the recognition of hate speech and fostering a deeper understanding of the linguistic and emotional dimensions of offensive language.

Penulis (3)

N. Vanetik

Marina Litvak

Chaya Liebeskind

Format Sitasi

APA MLA BibTeX

Vanetik, N., Litvak, M., Liebeskind, C. (2025). Fine-Grained Arabic Offensive Language Classification with Taxonomy, Sentiment, and Emotions. https://doi.org/10.26615/978-954-452-105-9-013

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.26615/978-954-452-105-9-013

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: Semantic Scholar
DOI: 10.26615/978-954-452-105-9-013
Akses: Open Access ✓