A combined AraBERT and Voting Ensemble classifier model for Arabic sentiment analysis
Abstrak
For sentiment analysis of short texts (e.g. movie reviews, tweets, etc.), one approach is to build machine learning models that can determine their tones (positive, negative, neutral). However, these natural language processing (NLP) studies are missing when there is a lack of high-quality and large-scale training data for specific languages such as Arabic. In this paper, we present three machine learning models designed to classify sentiment Arabic tweets developed for a Kaggle competition. We present a Voting Ensemble classifier taking advantage of both character-level and word-level features. We also propose an AraBERT (Arabic Bidirectional Encoder Representations from Transformers) model with preprocessing using Farasa Segmenter. Finally, we combine these first two approaches as a third approach (Voting Ensemble classifier using AraBERT embeddings). Performance measures of results show improvement over previous efforts for all models. The third model exhibits strong performance with a 73.98% F-score score. The work presented here could be useful for future studies and for new Arabic sentiment analysis online services or competitions.
Topik & Kata Kunci
Penulis (4)
Dhaou Ghoul
Jérémy Patrix
Gaël Lejeune
Jérôme Verny
Akses Cepat
- Tahun Terbit
- 2024
- Sumber Database
- DOAJ
- DOI
- 10.1016/j.nlp.2024.100100
- Akses
- Open Access ✓