arXiv Open Access 2020

Designing the Business Conversation Corpus

Matīss Rikters Ryokan Ri Tong Li Toshiaki Nakazawa
Lihat Sumber

Abstrak

While the progress of machine translation of written text has come far in the past several years thanks to the increasing availability of parallel corpora and corpora-based training technologies, automatic translation of spoken text and dialogues remains challenging even for modern systems. In this paper, we aim to boost the machine translation quality of conversational texts by introducing a newly constructed Japanese-English business conversation parallel corpus. A detailed analysis of the corpus is provided along with challenging examples for automatic translation. We also experiment with adding the corpus in a machine translation training scenario and show how the resulting system benefits from its use.

Topik & Kata Kunci

Penulis (4)

M

Matīss Rikters

R

Ryokan Ri

T

Tong Li

T

Toshiaki Nakazawa

Format Sitasi

Rikters, M., Ri, R., Li, T., Nakazawa, T. (2020). Designing the Business Conversation Corpus. https://arxiv.org/abs/2008.01940

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2020
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓