DOAJ Open Access 2024

A combined AraBERT and Voting Ensemble classifier model for Arabic sentiment analysis

Dhaou Ghoul Jérémy Patrix Gaël Lejeune Jérôme Verny

Abstrak

For sentiment analysis of short texts (e.g. movie reviews, tweets, etc.), one approach is to build machine learning models that can determine their tones (positive, negative, neutral). However, these natural language processing (NLP) studies are missing when there is a lack of high-quality and large-scale training data for specific languages such as Arabic. In this paper, we present three machine learning models designed to classify sentiment Arabic tweets developed for a Kaggle competition. We present a Voting Ensemble classifier taking advantage of both character-level and word-level features. We also propose an AraBERT (Arabic Bidirectional Encoder Representations from Transformers) model with preprocessing using Farasa Segmenter. Finally, we combine these first two approaches as a third approach (Voting Ensemble classifier using AraBERT embeddings). Performance measures of results show improvement over previous efforts for all models. The third model exhibits strong performance with a 73.98% F-score score. The work presented here could be useful for future studies and for new Arabic sentiment analysis online services or competitions.

Topik & Kata Kunci

Computational linguistics. Natural language processing

Penulis (4)

Dhaou Ghoul

Jérémy Patrix

Gaël Lejeune

Jérôme Verny

Format Sitasi

APA MLA BibTeX

Ghoul, D., Patrix, J., Lejeune, G., Verny, J. (2024). A combined AraBERT and Voting Ensemble classifier model for Arabic sentiment analysis. https://doi.org/10.1016/j.nlp.2024.100100

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.1016/j.nlp.2024.100100

Informasi Jurnal

Tahun Terbit: 2024
Sumber Database: DOAJ
DOI: 10.1016/j.nlp.2024.100100
Akses: Open Access ✓