DOAJ Open Access 2026

Enhancing sentiment classification on small datasets through data augmentation and transfer learning

Mahmoud S. Mayaleh Samer A. Mayaleh

Abstrak

Abstract Small-scale sentiment classification often suffers from data scarcity, which limits the generalization ability of the models. This study evaluates and compares the effectiveness of three data augmentation strategies: Easy Data Augmentation (EDA), back-translation, and contextual token substitution (nlpaug-style), with both traditional machine learning classifiers (Logistic Regression, Random Forest) and transformer-based models (BERT). We perform a comprehensive empirical comparison with low-resource sentiment datasets by summarizing the results of recent studies and performing targeted head-to-head experiments. Our findings indicate that all augmentation methods improve performance. Contextual augmentation yields the most consistent gains for BERT models, while EDA and back-translation provide greater benefits for traditional classifiers. These insights help guide the selection of data augmentation techniques tailored to model type and dataset size, filling a critical gap in research on data augmentation for sentiment classification on small datasets.

Topik & Kata Kunci

Computational linguistics. Natural language processing Electronic computers. Computer science

Penulis (2)

Mahmoud S. Mayaleh

Samer A. Mayaleh

Format Sitasi

APA MLA BibTeX

Mayaleh, M.S., Mayaleh, S.A. (2026). Enhancing sentiment classification on small datasets through data augmentation and transfer learning. https://doi.org/10.1007/s44163-025-00813-9

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.1007/s44163-025-00813-9

Informasi Jurnal

Tahun Terbit: 2026
Sumber Database: DOAJ
DOI: 10.1007/s44163-025-00813-9
Akses: Open Access ✓