arXiv Open Access 2023

ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis

Javier de la Rosa Álvaro Pérez Pozo Salvador Ros Elena González-Blanco
Lihat Sumber

Abstrak

The computational analysis of poetry is limited by the scarcity of tools to automatically analyze and scan poems. In a multilingual settings, the problem is exacerbated as scansion and rhyme systems only exist for individual languages, making comparative studies very challenging and time consuming. In this work, we present \textsc{Alberti}, the first multilingual pre-trained large language model for poetry. Through domain-specific pre-training (DSP), we further trained multilingual BERT on a corpus of over 12 million verses from 12 languages. We evaluated its performance on two structural poetry tasks: Spanish stanza type classification, and metrical pattern prediction for Spanish, English and German. In both cases, \textsc{Alberti} outperforms multilingual BERT and other transformers-based models of similar sizes, and even achieves state-of-the-art results for German when compared to rule-based systems, demonstrating the feasibility and effectiveness of DSP in the poetry domain.

Topik & Kata Kunci

Penulis (4)

J

Javier de la Rosa

Á

Álvaro Pérez Pozo

S

Salvador Ros

E

Elena González-Blanco

Format Sitasi

Rosa, J.d.l., Pozo, Á.P., Ros, S., González-Blanco, E. (2023). ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis. https://arxiv.org/abs/2307.01387

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓