Semantic Scholar Open Access 2023 3 sitasi

Neural Speech Synthesis for Austrian Dialects with Standard German Grapheme-to-Phoneme Conversion and Dialect Embeddings

Lorenz Gutscher Michael Pucher Victor García

Abstrak

For languages where extensive audio data and text transcriptions are available, text-to-speech (TTS) systems have show-cased the ability to generate speech that closely resembles nat-ural human speech. However, the development of TTS systems for dialects and language varieties poses challenges such as limited data availability and strong regional variations. This paper presents a TTS system tailored for under-resourced language varieties spoken in Austrian regions. The system is built upon the FastSpeech 2 architecture and includes modifications to incorporate dialect embeddings for training and inference. It is demonstrated that employing dialect embeddings and a standard German grapheme-to-phoneme conversion is effective in modeling language varieties and provides means to shift a person’s spoken variety from one to another. This allows for the generation of regional standards for dialect speakers or the generation of dialect speech with the voice of a standard speaker. The findings unveil new possibilities and applications in other multilingual contexts where shared characteristics within the language or dialect embedding space can be leveraged.

Penulis (3)

L

Lorenz Gutscher

M

Michael Pucher

V

Victor García

Format Sitasi

Gutscher, L., Pucher, M., García, V. (2023). Neural Speech Synthesis for Austrian Dialects with Standard German Grapheme-to-Phoneme Conversion and Dialect Embeddings. https://doi.org/10.21437/sigul.2023-15

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.21437/sigul.2023-15
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Total Sitasi
Sumber Database
Semantic Scholar
DOI
10.21437/sigul.2023-15
Akses
Open Access ✓