Neural Speech Synthesis for Austrian Dialects with Standard German Grapheme-to-Phoneme Conversion and Dialect Embeddings
Abstrak
For languages where extensive audio data and text transcriptions are available, text-to-speech (TTS) systems have show-cased the ability to generate speech that closely resembles nat-ural human speech. However, the development of TTS systems for dialects and language varieties poses challenges such as limited data availability and strong regional variations. This paper presents a TTS system tailored for under-resourced language varieties spoken in Austrian regions. The system is built upon the FastSpeech 2 architecture and includes modifications to incorporate dialect embeddings for training and inference. It is demonstrated that employing dialect embeddings and a standard German grapheme-to-phoneme conversion is effective in modeling language varieties and provides means to shift a person’s spoken variety from one to another. This allows for the generation of regional standards for dialect speakers or the generation of dialect speech with the voice of a standard speaker. The findings unveil new possibilities and applications in other multilingual contexts where shared characteristics within the language or dialect embedding space can be leveraged.
Penulis (3)
Lorenz Gutscher
Michael Pucher
Victor García
Akses Cepat
- Tahun Terbit
- 2023
- Bahasa
- en
- Total Sitasi
- 3×
- Sumber Database
- Semantic Scholar
- DOI
- 10.21437/sigul.2023-15
- Akses
- Open Access ✓