arXiv Open Access 2025

PEACH: A sentence-aligned Parallel English-Arabic Corpus for Healthcare

Rania Al-Sabbagh
Lihat Sumber

Abstrak

This paper introduces PEACH, a sentence-aligned parallel English-Arabic corpus of healthcare texts encompassing patient information leaflets and educational materials. The corpus contains 51,671 parallel sentences, totaling approximately 590,517 English and 567,707 Arabic word tokens. Sentence lengths vary between 9.52 and 11.83 words on average. As a manually aligned corpus, PEACH is a gold-standard corpus, aiding researchers in contrastive linguistics, translation studies, and natural language processing. It can be used to derive bilingual lexicons, adapt large language models for domain-specific machine translation, evaluate user perceptions of machine translation in healthcare, assess patient information leaflets and educational materials' readability and lay-friendliness, and as an educational resource in translation studies. PEACH is publicly accessible.

Topik & Kata Kunci

Penulis (1)

R

Rania Al-Sabbagh

Format Sitasi

Al-Sabbagh, R. (2025). PEACH: A sentence-aligned Parallel English-Arabic Corpus for Healthcare. https://arxiv.org/abs/2508.05722

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓