arXiv Open Access 2023

DuoSearch: A Novel Search Engine for Bulgarian Historical Documents

Angel Beshirov Suzan Hadzhieva Ivan Koychev Milena Dobreva
Lihat Sumber

Abstrak

Search in collections of digitised historical documents is hindered by a two-prong problem, orthographic variety and optical character recognition (OCR) mistakes. We present a new search engine for historical documents, DuoSearch, which uses ElasticSearch and machine learning methods based on deep neural networks to offer a solution to this problem. It was tested on a collection of historical newspapers in Bulgarian from the mid-19th to the mid-20th century. The system provides an interactive and intuitive interface for the end-users allowing them to enter search terms in modern Bulgarian and search across historical spellings. This is the first solution facilitating the use of digitised historical documents in Bulgarian.

Topik & Kata Kunci

Penulis (4)

A

Angel Beshirov

S

Suzan Hadzhieva

I

Ivan Koychev

M

Milena Dobreva

Format Sitasi

Beshirov, A., Hadzhieva, S., Koychev, I., Dobreva, M. (2023). DuoSearch: A Novel Search Engine for Bulgarian Historical Documents. https://arxiv.org/abs/2305.19392

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓