DOAJ Open Access 2026

Enhancing sentiment classification on small datasets through data augmentation and transfer learning

Mahmoud S. Mayaleh Samer A. Mayaleh

Abstrak

Abstract Small-scale sentiment classification often suffers from data scarcity, which limits the generalization ability of the models. This study evaluates and compares the effectiveness of three data augmentation strategies: Easy Data Augmentation (EDA), back-translation, and contextual token substitution (nlpaug-style), with both traditional machine learning classifiers (Logistic Regression, Random Forest) and transformer-based models (BERT). We perform a comprehensive empirical comparison with low-resource sentiment datasets by summarizing the results of recent studies and performing targeted head-to-head experiments. Our findings indicate that all augmentation methods improve performance. Contextual augmentation yields the most consistent gains for BERT models, while EDA and back-translation provide greater benefits for traditional classifiers. These insights help guide the selection of data augmentation techniques tailored to model type and dataset size, filling a critical gap in research on data augmentation for sentiment classification on small datasets.

Penulis (2)

M

Mahmoud S. Mayaleh

S

Samer A. Mayaleh

Format Sitasi

Mayaleh, M.S., Mayaleh, S.A. (2026). Enhancing sentiment classification on small datasets through data augmentation and transfer learning. https://doi.org/10.1007/s44163-025-00813-9

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.1007/s44163-025-00813-9
Informasi Jurnal
Tahun Terbit
2026
Sumber Database
DOAJ
DOI
10.1007/s44163-025-00813-9
Akses
Open Access ✓