arXiv Open Access 2023

An investigation into the adaptability of a diffusion-based TTS model

Haolin Chen Philip N. Garner
Lihat Sumber

Abstrak

Given the recent success of diffusion in producing natural-sounding synthetic speech, we investigate how diffusion can be used in speaker adaptive TTS. Taking cues from more traditional adaptation approaches, we show that adaptation can be included in a diffusion pipeline using conditional layer normalization with a step embedding. However, we show experimentally that, whilst the approach has merit, such adaptation alone cannot approach the performance of Transformer-based techniques. In a second experiment, we show that diffusion can be optimally combined with Transformer, with the latter taking the bulk of the adaptation load and the former contributing to improved naturalness.

Topik & Kata Kunci

Penulis (2)

H

Haolin Chen

P

Philip N. Garner

Format Sitasi

Chen, H., Garner, P.N. (2023). An investigation into the adaptability of a diffusion-based TTS model. https://arxiv.org/abs/2303.01849

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓