Semantic Scholar Open Access 2023 73 sitasi

A Scenario-Generic Neural Machine Translation Data Augmentation Method

Xiner Liu Jia He Mingzhe Liu Zhengtong Yin Lirong Yin +1 lainnya

Abstrak

Amid the rapid advancement of neural machine translation, the challenge of data sparsity has been a major obstacle. To address this issue, this study proposes a general data augmentation technique for various scenarios. It examines the predicament of parallel corpora diversity and high quality in both rich- and low-resource settings, and integrates the low-frequency word substitution method and reverse translation approach for complementary benefits. Additionally, this method improves the pseudo-parallel corpus generated by the reverse translation method by substituting low-frequency words and includes a grammar error correction module to reduce grammatical errors in low-resource scenarios. The experimental data are partitioned into rich- and low-resource scenarios at a 10:1 ratio. It verifies the necessity of grammatical error correction for pseudo-corpus in low-resource scenarios. Models and methods are chosen from the backbone network and related literature for comparative experiments. The experimental findings demonstrate that the data augmentation approach proposed in this study is suitable for both rich- and low-resource scenarios and is effective in enhancing the training corpus to improve the performance of translation tasks.

Penulis (6)

X

Xiner Liu

J

Jia He

M

Mingzhe Liu

Z

Zhengtong Yin

L

Lirong Yin

W

Wenfeng Zheng

Format Sitasi

Liu, X., He, J., Liu, M., Yin, Z., Yin, L., Zheng, W. (2023). A Scenario-Generic Neural Machine Translation Data Augmentation Method. https://doi.org/10.3390/electronics12102320

Akses Cepat

Lihat di Sumber doi.org/10.3390/electronics12102320
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Total Sitasi
73×
Sumber Database
Semantic Scholar
DOI
10.3390/electronics12102320
Akses
Open Access ✓