arXiv Open Access 2016

OCR Error Correction Using Character Correction and Feature-Based Word Classification

Ido Kissos Nachum Dershowitz

Lihat Sumber

Abstrak

This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow language model, improves the vast majority of segmentation and recognition errors, the most frequent types of error on our dataset.

Topik & Kata Kunci

cs.IR cs.CL

Penulis (2)

Ido Kissos

Nachum Dershowitz

Format Sitasi

APA MLA BibTeX

Kissos, I., Dershowitz, N. (2016). OCR Error Correction Using Character Correction and Feature-Based Word Classification. https://arxiv.org/abs/1604.06225

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2016
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓