arXiv Open Access 2021

Capitalization and Punctuation Restoration: a Survey

Vasile Păiş Dan Tufiş
Lihat Sumber

Abstrak

Ensuring proper punctuation and letter casing is a key pre-processing step towards applying complex natural language processing algorithms. This is especially significant for textual sources where punctuation and casing are missing, such as the raw output of automatic speech recognition systems. Additionally, short text messages and micro-blogging platforms offer unreliable and often wrong punctuation and casing. This survey offers an overview of both historical and state-of-the-art techniques for restoring punctuation and correcting word casing. Furthermore, current challenges and research directions are highlighted.

Topik & Kata Kunci

Penulis (2)

V

Vasile Păiş

D

Dan Tufiş

Format Sitasi

Păiş, V., Tufiş, D. (2021). Capitalization and Punctuation Restoration: a Survey. https://arxiv.org/abs/2111.10746

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2021
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓