Semantic Scholar Open Access 2017 4 sitasi

Multi-source morphosyntactic tagging for spoken Rusyn

Yves Scherrer Achim Rabus

Abstrak

This paper deals with the development of morphosyntactic taggers for spoken varieties of the Slavic minority language Rusyn. As neither annotated corpora nor parallel corpora are electronically available for Rusyn, we propose to combine existing resources from the etymologically close Slavic languages Russian, Ukrainian, Slovak, and Polish and adapt them to Rusyn. Using MarMoT as tagging toolkit, we show that a tagger trained on a balanced set of the four source languages outperforms single language taggers by about 9%, and that additional automatically induced morphosyntactic lexicons lead to further improvements. The best observed accuracies for Rusyn are 82.4% for part-of-speech tagging and 75.5% for full morphological tagging.

Topik & Kata Kunci

Penulis (2)

Y

Yves Scherrer

A

Achim Rabus

Format Sitasi

Scherrer, Y., Rabus, A. (2017). Multi-source morphosyntactic tagging for spoken Rusyn. https://doi.org/10.18653/v1/W17-1210

Akses Cepat

Lihat di Sumber doi.org/10.18653/v1/W17-1210
Informasi Jurnal
Tahun Terbit
2017
Bahasa
en
Total Sitasi
Sumber Database
Semantic Scholar
DOI
10.18653/v1/W17-1210
Akses
Open Access ✓