arXiv Open Access 2024

What talking you?: Translating Code-Mixed Messaging Texts to English

Lynnette Hui Xian Ng Luo Qi Chan
Lihat Sumber

Abstrak

Translation of code-mixed texts to formal English allow a wider audience to understand these code-mixed languages, and facilitate downstream analysis applications such as sentiment analysis. In this work, we look at translating Singlish, which is colloquial Singaporean English, to formal standard English. Singlish is formed through the code-mixing of multiple Asian languages and dialects. We analysed the presence of other Asian languages and variants which can facilitate translation. Our dataset is short message texts, written as informal communication between Singlish speakers. We use a multi-step prompting scheme on five Large Language Models (LLMs) for language detection and translation. Our analysis show that LLMs do not perform well in this task, and we describe the challenges involved in translation of code-mixed languages. We also release our dataset in this link https://github.com/luoqichan/singlish.

Topik & Kata Kunci

Penulis (2)

L

Lynnette Hui Xian Ng

L

Luo Qi Chan

Format Sitasi

Ng, L.H.X., Chan, L.Q. (2024). What talking you?: Translating Code-Mixed Messaging Texts to English. https://arxiv.org/abs/2411.05253

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓