arXiv Open Access 2025

Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement

Rauf Nasretdinov Roman Korostik Ante Jukić
Lihat Sumber

Abstrak

In this work, we investigate application of generative speech enhancement to improve the robustness of ASR models in noisy and reverberant conditions. We employ a recently-proposed speech enhancement model based on Schrödinger bridge, which has been shown to perform well compared to diffusion-based approaches. We analyze the impact of model scaling and different sampling methods on the ASR performance. Furthermore, we compare the considered model with predictive and diffusion-based baselines and analyze the speech recognition performance when using different pre-trained ASR models. The proposed approach significantly reduces the word error rate, reducing it by approximately 40% relative to the unprocessed speech signals and by approximately 8% relative to a similarly sized predictive approach.

Topik & Kata Kunci

Penulis (3)

R

Rauf Nasretdinov

R

Roman Korostik

A

Ante Jukić

Format Sitasi

Nasretdinov, R., Korostik, R., Jukić, A. (2025). Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement. https://arxiv.org/abs/2505.04237

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓