arXiv Open Access 2024

A State-of-the-Art Morphosyntactic Parser and Lemmatizer for Ancient Greek

Giuseppe G. A. Celano
Lihat Sumber

Abstrak

This paper presents an experiment consisting in the comparison of six models to identify a state-of-the-art morphosyntactic parser and lemmatizer for Ancient Greek capable of annotating according to the Ancient Greek Dependency Treebank annotation scheme. A normalized version of the major collections of annotated texts was used to (i) train the baseline model Dithrax with randomly initialized character embeddings and (ii) fine-tune Trankit and four recent models pretrained on Ancient Greek texts, i.e., GreBERTa and PhilBERTa for morphosyntactic annotation and GreTA and PhilTa for lemmatization. A Bayesian analysis shows that Dithrax and Trankit annotate morphology practically equivalently, while syntax is best annotated by Trankit and lemmata by GreTa. The results of the experiment suggest that token embeddings are not sufficient to achieve high UAS and LAS scores unless they are coupled with a modeling strategy specifically designed to capture syntactic relationships. The dataset and best-performing models are made available online for reuse.

Topik & Kata Kunci

Penulis (1)

G

Giuseppe G. A. Celano

Format Sitasi

Celano, G.G.A. (2024). A State-of-the-Art Morphosyntactic Parser and Lemmatizer for Ancient Greek. https://arxiv.org/abs/2410.12055

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓