arXiv Open Access 2026

Disentangling Pitch and Creak for Speaker Identity Preservation in Speech Synthesis

Frederik Rautenberg Jana Wiechmann Petra Wagner Reinhold Haeb-Umbach

Lihat Sumber

Abstrak

We introduce a system capable of faithfully modifying the perceptual voice quality of creak while preserving the speaker's perceived identity. While it is well known that high creak probability is typically correlated with low pitch, it is important to note that this is a property observed on a population of speakers but does not necessarily hold across all situations. Disentanglement of pitch from creak is achieved by augmentation of the training dataset of a speech synthesis system with a speaker manipulation block based on conditional continuous normalizing flow. The experiments show greatly improved speaker verification performance over a range of creak manipulation strengths.

Topik & Kata Kunci

eess.AS

Penulis (4)

Frederik Rautenberg

Jana Wiechmann

Petra Wagner

Reinhold Haeb-Umbach

Format Sitasi

APA MLA BibTeX

Rautenberg, F., Wiechmann, J., Wagner, P., Haeb-Umbach, R. (2026). Disentangling Pitch and Creak for Speaker Identity Preservation in Speech Synthesis. https://arxiv.org/abs/2602.14686

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2026
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓