arXiv Open Access 2022

Punctuation Restoration in Spanish Customer Support Transcripts using Transfer Learning

Xiliang Zhu Shayna Gardiner David Rossouw Tere Roldán Simon Corston-Oliver
Lihat Sumber

Abstrak

Automatic Speech Recognition (ASR) systems typically produce unpunctuated transcripts that have poor readability. In addition, building a punctuation restoration system is challenging for low-resource languages, especially for domain-specific applications. In this paper, we propose a Spanish punctuation restoration system designed for a real-time customer support transcription service. To address the data sparsity of Spanish transcripts in the customer support domain, we introduce two transfer-learning-based strategies: 1) domain adaptation using out-of-domain Spanish text data; 2) cross-lingual transfer learning leveraging in-domain English transcript data. Our experiment results show that these strategies improve the accuracy of the Spanish punctuation restoration system.

Topik & Kata Kunci

Penulis (5)

X

Xiliang Zhu

S

Shayna Gardiner

D

David Rossouw

T

Tere Roldán

S

Simon Corston-Oliver

Format Sitasi

Zhu, X., Gardiner, S., Rossouw, D., Roldán, T., Corston-Oliver, S. (2022). Punctuation Restoration in Spanish Customer Support Transcripts using Transfer Learning. https://arxiv.org/abs/2205.13961

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2022
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓