arXiv Open Access 2023

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

Viet Dac Lai Abel Salinas Hao Tan Trung Bui Quan Tran +4 lainnya
Lihat Sumber

Abstrak

Punctuation restoration is an important task in automatic speech recognition (ASR) which aim to restore the syntactic structure of generated ASR texts to improve readability. While punctuated texts are abundant from written documents, the discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts. This paper proposes a reinforcement learning method to exploit in-topic written texts and recent advances in large pre-trained generative language models to bridge this gap. The experiments show that our method achieves state-of-the-art performance on the ASR test set on two benchmark datasets for punctuation restoration.

Topik & Kata Kunci

Penulis (9)

V

Viet Dac Lai

A

Abel Salinas

H

Hao Tan

T

Trung Bui

Q

Quan Tran

S

Seunghyun Yoon

H

Hanieh Deilamsalehy

F

Franck Dernoncourt

T

Thien Huu Nguyen

Format Sitasi

Lai, V.D., Salinas, A., Tan, H., Bui, T., Tran, Q., Yoon, S. et al. (2023). Boosting Punctuation Restoration with Data Generation and Reinforcement Learning. https://arxiv.org/abs/2307.12949

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓