arXiv Open Access 2024

Ukrainian-to-English folktale corpus: Parallel corpus creation and augmentation for machine translation in low-resource languages

Olena Burda-Lassen
Lihat Sumber

Abstrak

Folktales are linguistically very rich and culturally significant in understanding the source language. Historically, only human translation has been used for translating folklore. Therefore, the number of translated texts is very sparse, which limits access to knowledge about cultural traditions and customs. We have created a new Ukrainian-To-English parallel corpus of familiar Ukrainian folktales based on available English translations and suggested several new ones. We offer a combined domain-specific approach to building and augmenting this corpus, considering the nature of the domain and differences in the purpose of human versus machine translation. Our corpus is word and sentence-aligned, allowing for the best curation of meaning, specifically tailored for use as training data for machine translation models.

Topik & Kata Kunci

Penulis (1)

O

Olena Burda-Lassen

Format Sitasi

Burda-Lassen, O. (2024). Ukrainian-to-English folktale corpus: Parallel corpus creation and augmentation for machine translation in low-resource languages. https://arxiv.org/abs/2410.10063

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓