arXiv Open Access 2024

Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System

Vimal Manohar Szu-Jui Chen Zhiqi Wang Yusuke Fujita Shinji Watanabe +1 lainnya
Lihat Sumber

Abstrak

This paper summarizes our acoustic modeling efforts in the Johns Hopkins University speech recognition system for the CHiME-5 challenge to recognize highly-overlapped dinner party speech recorded by multiple microphone arrays. We explore data augmentation approaches, neural network architectures, front-end speech dereverberation, beamforming and robust i-vector extraction with comparisons of our in-house implementations and publicly available tools. We finally achieved a word error rate of 69.4% on the development set, which is a 11.7% absolute improvement over the previous baseline of 81.1%, and release this improved baseline with refined techniques/tools as an advanced CHiME-5 recipe.

Topik & Kata Kunci

Penulis (6)

V

Vimal Manohar

S

Szu-Jui Chen

Z

Zhiqi Wang

Y

Yusuke Fujita

S

Shinji Watanabe

S

Sanjeev Khudanpur

Format Sitasi

Manohar, V., Chen, S., Wang, Z., Fujita, Y., Watanabe, S., Khudanpur, S. (2024). Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System. https://arxiv.org/abs/2405.11078

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓