DOAJ Open Access 2024

POS Tagging and Lemmatization of Historical Varieties of Languages. The Challenge of Old Italian

Manuel Favaro Marco Biffi Simonetta Montemagni

Abstrak

The paper discusses the challenges of POS tagging and lemmatization of historical varieties of Italian, and reports for both tasks the results of experiments carried out in a classical supervised domain adaptation scenario using the diachronic and typologically differentiated corpus built for the "Vocabolario Dinamico dell’Italiano Moderno" (VoDIM). For what concerns POS tagging, the effectiveness of retrained models is illustrated and substantiated with quantitative data, with a specific view to linguistic annotation results obtained with respect to specific language evolution stages, domains and textual genres. For lemmatization, different customized models have been developed, including lexicon-assisted ones and models retrained with historical annotated texts. In both cases, a detailed error analysis is provided.

Penulis (3)

M

Manuel Favaro

M

Marco Biffi

S

Simonetta Montemagni

Format Sitasi

Favaro, M., Biffi, M., Montemagni, S. (2024). POS Tagging and Lemmatization of Historical Varieties of Languages. The Challenge of Old Italian. https://doi.org/10.4000/ijcol.1325

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.4000/ijcol.1325
Informasi Jurnal
Tahun Terbit
2024
Sumber Database
DOAJ
DOI
10.4000/ijcol.1325
Akses
Open Access ✓