arXiv Open Access 2026

ASR for Affective Speech: Investigating Impact of Emotion and Speech Generative Strategy

Ya-Tse Wu Chi-Chun Lee
Lihat Sumber

Abstrak

This work investigates how emotional speech and generative strategies affect ASR performance. We analyze speech synthesized from three emotional TTS models and find that substitution errors dominate, with emotional expressiveness varying across models. Based on these insights, we introduce two generative strategies: one using transcription correctness and another using emotional salience, to construct fine-tuning subsets. Results show consistent WER improvements on real emotional datasets without noticeable degradation on clean LibriSpeech utterances. The combined strategy achieves the strongest gains, particularly for expressive speech. These findings highlight the importance of targeted augmentation for building emotion-aware ASR systems.

Topik & Kata Kunci

Penulis (2)

Y

Ya-Tse Wu

C

Chi-Chun Lee

Format Sitasi

Wu, Y., Lee, C. (2026). ASR for Affective Speech: Investigating Impact of Emotion and Speech Generative Strategy. https://arxiv.org/abs/2601.20319

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2026
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓