arXiv Open Access 2022

Metadata Might Make Language Models Better

Kaspar Beelen Daniel van Strien

Lihat Sumber

Abstrak

This paper discusses the benefits of including metadata when training language models on historical collections. Using 19th-century newspapers as a case study, we extend the time-masking approach proposed by Rosin et al., 2022 and compare different strategies for inserting temporal, political and geographical information into a Masked Language Model. After fine-tuning several DistilBERT on enhanced input data, we provide a systematic evaluation of these models on a set of evaluation tasks: pseudo-perplexity, metadata mask-filling and supervised classification. We find that showing relevant metadata to a language model has a beneficial impact and may even produce more robust and fairer models.

Topik & Kata Kunci

cs.CL cs.DL

Penulis (2)

Kaspar Beelen

Daniel van Strien

Format Sitasi

APA MLA BibTeX

Beelen, K., Strien, D.v. (2022). Metadata Might Make Language Models Better. https://arxiv.org/abs/2211.10086

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2022
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓