DOAJ Open Access 2025

ICRCycleGAN-VC: a robust one-to-one voice conversion method based on CycleGAN and inception-ResNet blocks

Nayereh Seyed Afiuny Amir Lakizadeh

Abstrak

Abstract Voice conversion (VC) transforms a source speaker’s voice into that of a target speaker while preserving the underlying linguistic content. However, existing methods, especially for languages with complex phonetic structures like Persian, often struggle with issues such as over-smoothing, inadequate multi-scale feature extraction, and loss of essential acoustic details. In this paper, we introduce ICRCycleGAN-VC, an innovative one-to-one voice conversion framework that integrates Inception-ResNet modules into a CycleGAN architecture. By leveraging multi-scale convolutional filters, residual connections, and an optimized loss function strategy that eliminates second adversarial losses in the generator, our approach significantly improves the preservation of linguistic content, addressing the main challenge of ensuring accurate content retention. Extensive experiments on both Persian and English datasets demonstrate significant improvements, achieving notable reductions in mel-cepstral distortion and root mean squared error compared to baseline models such as MaskCycleGAN-VC. Furthermore, subjective evaluations reveal a substantial increase in both voice similarity and naturalness. Ablation studies highlight the critical contributions of each architectural component, confirming the robustness of our approach in advancing non-parallel voice conversion.

Topik & Kata Kunci

Acoustics. Sound Electronic computers. Computer science

Penulis (2)

Nayereh Seyed Afiuny

Amir Lakizadeh

Format Sitasi

APA MLA BibTeX

Afiuny, N.S., Lakizadeh, A. (2025). ICRCycleGAN-VC: a robust one-to-one voice conversion method based on CycleGAN and inception-ResNet blocks. https://doi.org/10.1186/s13636-025-00422-5

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.1186/s13636-025-00422-5

Informasi Jurnal

Tahun Terbit: 2025
Sumber Database: DOAJ
DOI: 10.1186/s13636-025-00422-5
Akses: Open Access ✓