arXiv Open Access 2025

A World in Print: Introducing a Danish-Norwegian corpus of historical newspapers

Johan Heinsen Camilla Bøgeskov
Lihat Sumber

Abstrak

This Data Descriptor introduces the dataset Enevaeldens Nyheder Online (News during Absolutism Online). The Enevaeldens Nyheder Online (ENO) dataset provides a reconstruction of the contents of major newspapers in Denmark and Norway during the period of Absolutism (1660-1849). The dataset contains approx. 474 million words, created using neural networks designed to process digitised microfilm versions of Danish newspapers as well as a smaller selection of Norwegian publications that were all hitherto illegible for computers. The contributions details this process and its results, including a way to derive standalone texts from the editions, and the accompanying BERT-model trained on a beta-version of the dataset.

Topik & Kata Kunci

Penulis (2)

J

Johan Heinsen

C

Camilla Bøgeskov

Format Sitasi

Heinsen, J., Bøgeskov, C. (2025). A World in Print: Introducing a Danish-Norwegian corpus of historical newspapers. https://arxiv.org/abs/2509.02356

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓