arXiv
Open Access
2016
OCR Error Correction Using Character Correction and Feature-Based Word Classification
Ido Kissos
Nachum Dershowitz
Abstrak
This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow language model, improves the vast majority of segmentation and recognition errors, the most frequent types of error on our dataset.
Penulis (2)
I
Ido Kissos
N
Nachum Dershowitz
Akses Cepat
Informasi Jurnal
- Tahun Terbit
- 2016
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓