arXiv Open Access 2021

Machine Translation into Low-resource Language Varieties

Sachin Kumar Antonios Anastasopoulos Shuly Wintner Yulia Tsvetkov
Lihat Sumber

Abstrak

State-of-the-art machine translation (MT) systems are typically trained to generate the "standard" target language; however, many languages have multiple varieties (regional varieties, dialects, sociolects, non-native varieties) that are different from the standard language. Such varieties are often low-resource, and hence do not benefit from contemporary NLP solutions, MT included. We propose a general framework to rapidly adapt MT systems to generate language varieties that are close to, but different from, the standard target language, using no parallel (source--variety) data. This also includes adaptation of MT systems to low-resource typologically-related target languages. We experiment with adapting an English--Russian MT system to generate Ukrainian and Belarusian, an English--Norwegian Bokmål system to generate Nynorsk, and an English--Arabic system to generate four Arabic dialects, obtaining significant improvements over competitive baselines.

Topik & Kata Kunci

Penulis (4)

S

Sachin Kumar

A

Antonios Anastasopoulos

S

Shuly Wintner

Y

Yulia Tsvetkov

Format Sitasi

Kumar, S., Anastasopoulos, A., Wintner, S., Tsvetkov, Y. (2021). Machine Translation into Low-resource Language Varieties. https://arxiv.org/abs/2106.06797

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2021
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓