arXiv Open Access 2024

Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System

Vimal Manohar Szu-Jui Chen Zhiqi Wang Yusuke Fujita Shinji Watanabe +1 lainnya

Lihat Sumber

Abstrak

This paper summarizes our acoustic modeling efforts in the Johns Hopkins University speech recognition system for the CHiME-5 challenge to recognize highly-overlapped dinner party speech recorded by multiple microphone arrays. We explore data augmentation approaches, neural network architectures, front-end speech dereverberation, beamforming and robust i-vector extraction with comparisons of our in-house implementations and publicly available tools. We finally achieved a word error rate of 69.4% on the development set, which is a 11.7% absolute improvement over the previous baseline of 81.1%, and release this improved baseline with refined techniques/tools as an advanced CHiME-5 recipe.

Topik & Kata Kunci

eess.AS

Penulis (6)

Vimal Manohar

Szu-Jui Chen

Zhiqi Wang

Yusuke Fujita

Shinji Watanabe

Sanjeev Khudanpur

Format Sitasi

APA MLA BibTeX

Manohar, V., Chen, S., Wang, Z., Fujita, Y., Watanabe, S., Khudanpur, S. (2024). Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System. https://arxiv.org/abs/2405.11078

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓