Semantic Scholar Open Access 2022 17 sitasi

Parallel Corpus Research and Target Language Representativeness: The Contrastive, Typological, and Translation Mining Traditions

Bert Le Bruyn Martín Fuchs Martijn van der Klis Jianan Liu Chou Mo +2 lainnya

Abstrak

This paper surveys the strategies that the Contrastive, Typological, and Translation Mining parallel corpus traditions rely on to deal with the issue of target language representativeness of translations. On the basis of a comparison of the corpus architectures and research designs of the three traditions, we argue that they have each developed their own representativeness strategies: (i) monolingual control corpora (Contrastive tradition), (ii) limits on the scope of research questions (Typological tradition), and (iii) parallel control corpora (Translation Mining tradition). We introduce normalized pointwise mutual information (NPMI) as a bi-directional measure of cross-linguistic association, allowing for an easy comparison of the outcomes of different traditions and the impact of the monolingual and parallel control corpus representativeness strategies. We further argue that corpus size has a major impact on the reliability of the monolingual control corpus strategy and that a sequential parallel control corpus strategy is preferable for smaller corpora.

Penulis (7)

B

Bert Le Bruyn

M

Martín Fuchs

M

Martijn van der Klis

J

Jianan Liu

C

Chou Mo

J

J. Tellings

H

H. de Swart

Format Sitasi

Bruyn, B.L., Fuchs, M., Klis, M.v.d., Liu, J., Mo, C., Tellings, J. et al. (2022). Parallel Corpus Research and Target Language Representativeness: The Contrastive, Typological, and Translation Mining Traditions. https://doi.org/10.3390/languages7030176

Akses Cepat

Lihat di Sumber doi.org/10.3390/languages7030176
Informasi Jurnal
Tahun Terbit
2022
Bahasa
en
Total Sitasi
17×
Sumber Database
Semantic Scholar
DOI
10.3390/languages7030176
Akses
Open Access ✓