DOAJ Open Access 2023

OCR17: Ground Truth and Models for 17th c. French Prints (and hopefully more)

Simon Gabay Thibault Clérice Christian Reul

Abstrak

Machine learning begins with machine teaching: in the following paper, we present the data that we have prepared to kick-start the training of reliable OCR models for 17th century prints written in French. The construction of a representative corpus is a major challenge: we need to gather documents from different decades and of different genres to cover as many sizes, weights and styles as possible. Historical prints containing glyphs and typefaces that have now disappeared, transcription is a complex act, for which we present guidelines. Finally, we provide preliminary results based on these training data and experiments to improve them.

Penulis (3)

S

Simon Gabay

T

Thibault Clérice

C

Christian Reul

Format Sitasi

Gabay, S., Clérice, T., Reul, C. (2023). OCR17: Ground Truth and Models for 17th c. French Prints (and hopefully more). https://doi.org/10.46298/jdmdh.6492

Akses Cepat

Lihat di Sumber doi.org/10.46298/jdmdh.6492
Informasi Jurnal
Tahun Terbit
2023
Sumber Database
DOAJ
DOI
10.46298/jdmdh.6492
Akses
Open Access ✓