DOAJ Open Access 2025

Singing to speech conversion with generative flow

Jiawen Huang Emmanouil Benetos

Abstrak

Abstract This paper introduces singing to speech conversion (S2S), a cross-domain voice conversion task, and presents the first deep learning-based S2S system. S2S aims to transform singing into speech while retaining the phonetic information, reducing variations in pitch, rhythm, and timbre. Inspired by the Glow-TTS architecture, the proposed model is built using generative flow, with an adjusted alignment module between the latent features. We adapt the original monotonic alignment search (MAS) to the S2S scenario and utilize a duration predictor to deal with the duration differences between the two modalities. Subjective evaluations show that the proposed model outperforms signal processing baselines in naturalness and outperforms a transcribe-and-synthesize baseline in phonetic similarity to the original singing. We further demonstrate that singing-to-speech could be an effective augmentation method for low-resource lyrics transcription.

Topik & Kata Kunci

Acoustics. Sound Electronic computers. Computer science

Penulis (2)

Jiawen Huang

Emmanouil Benetos

Format Sitasi

APA MLA BibTeX

Huang, J., Benetos, E. (2025). Singing to speech conversion with generative flow. https://doi.org/10.1186/s13636-025-00400-x

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.1186/s13636-025-00400-x

Informasi Jurnal

Tahun Terbit: 2025
Sumber Database: DOAJ
DOI: 10.1186/s13636-025-00400-x
Akses: Open Access ✓