DOAJ Open Access 2025

Exploring the Impact of Back-Translation on BERT's Performance in Sentiment Analysis of Code-Mixed Language Data

Nisrina Hanifa Setiono Yunita Sari

Abstrak

Social media, particularly Twitter, has become a key platform for communication and opinion-sharing, where code mixing, the blending of multiple languages in a single sentence, is common. In Indonesia, Indonesian-English code mixing is widely used, especially in urban areas. However, sentiment analysis on code-mixed text poses challenges in natural language processing (NLP) due to the informal nature of the data and the limitations of models trained on formal text. This study applies back translation to address these challenges and optimize BERT-based sentiment analysis. The method is tested on the INDONGLISH dataset, consisting of 5,067 labeled tweets. Results show that applying back translation directly to raw tweets yields better performance by preserving original meaning, improving model accuracy. However, when back translation follows monolingual translation, accuracy declines due to semantic distortions. Repeated translation modifies sentence structure and sentiment labels, reducing reliability. These findings indicate that each additional translation step risks decreasing sentiment analysis accuracy, particularly for code-mixed datasets, which are highly sensitive to linguistic shifts. Back translation proves to be an effective approach for formalizing data while maintaining contextual integrity, enhancing sentiment analysis performance on code-mixed text

Penulis (2)

N

Nisrina Hanifa Setiono

Y

Yunita Sari

Format Sitasi

Setiono, N.H., Sari, Y. (2025). Exploring the Impact of Back-Translation on BERT's Performance in Sentiment Analysis of Code-Mixed Language Data. https://doi.org/10.22146/ijccs.104757

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.22146/ijccs.104757
Informasi Jurnal
Tahun Terbit
2025
Sumber Database
DOAJ
DOI
10.22146/ijccs.104757
Akses
Open Access ✓