arXiv Open Access 2022

ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English

Injy Hamed Nizar Habash Slim Abdennadher Ngoc Thang Vu
Lihat Sumber

Abstrak

We present our work on collecting ArzEn-ST, a code-switched Egyptian Arabic - English Speech Translation Corpus. This corpus is an extension of the ArzEn speech corpus, which was collected through informal interviews with bilingual speakers. In this work, we collect translations in both directions, monolingual Egyptian Arabic and monolingual English, forming a three-way speech translation corpus. We make the translation guidelines and corpus publicly available. We also report results for baseline systems for machine translation and speech translation tasks. We believe this is a valuable resource that can motivate and facilitate further research studying the code-switching phenomenon from a linguistic perspective and can be used to train and evaluate NLP systems.

Topik & Kata Kunci

Penulis (4)

I

Injy Hamed

N

Nizar Habash

S

Slim Abdennadher

N

Ngoc Thang Vu

Format Sitasi

Hamed, I., Habash, N., Abdennadher, S., Vu, N.T. (2022). ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English. https://arxiv.org/abs/2211.12000

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2022
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓