arXiv Open Access 2025

Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement

Rauf Nasretdinov Roman Korostik Ante Jukić

Lihat Sumber

Abstrak

In this work, we investigate application of generative speech enhancement to improve the robustness of ASR models in noisy and reverberant conditions. We employ a recently-proposed speech enhancement model based on Schrödinger bridge, which has been shown to perform well compared to diffusion-based approaches. We analyze the impact of model scaling and different sampling methods on the ASR performance. Furthermore, we compare the considered model with predictive and diffusion-based baselines and analyze the speech recognition performance when using different pre-trained ASR models. The proposed approach significantly reduces the word error rate, reducing it by approximately 40% relative to the unprocessed speech signals and by approximately 8% relative to a similarly sized predictive approach.

Topik & Kata Kunci

eess.AS cs.SD

Penulis (3)

Rauf Nasretdinov

Roman Korostik

Ante Jukić

Format Sitasi

APA MLA BibTeX

Nasretdinov, R., Korostik, R., Jukić, A. (2025). Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement. https://arxiv.org/abs/2505.04237

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓